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ABSTRACT 

The algebraic and geometric structure of certain classes of nonlinear 
stochastic systems is exploited in order to obtain useful stability 
and estimation results. First, the class of bilinear stochastic 
systems (or linear systems with multiplicative noise) is discussed. 
The stochastic stability of bilinear systems driven by colored noise 
is considered; in the case that the system evolves on a solvable Lie 
group, necessary and sufficient conditions for stochastic stability 
are derived. Approximate methods for obtaining sufficient conditions 
for the stochastic stability of bilinear systems evolving on general 
Lie groups are also discussed. 

The study of estimation problems involving bilinear systems is 
motivated by several practical applications involving rotational 
processes in three dimensions. Two classes of estimation problems 
are considered. First it is proved that, for systems described by 
certain types of Volterra series expansions or by certain bilinear 
equations evolving on nilpotent or solvable Lie groups, the optimal 
conditional mean estimator consists of a finite dimensional nonlinear 
set of equations. Finally, the theory of harmonic analysis is used 
to derive suboptimal estimators for bilinear systems driven by white 
noise which evolve on compact Lie groups or homogeneous spaces. 
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CHAPTER 1 


INTRODUCTION 


1. 1 Background and Motivation 

The problems of stability analysis and state estimation (or filtering) 
for nonlinear stochastic systems have been the subject of a great deal of 
research over the past several years. Optimal estimators have been de- 
rived for very general classes of nonlinear systems [FI], [K2]. However, 
the optimal estimator requires, in general, an infinite dimensional 
computation to generate the conditional mean of the system state given 
the past observations. This computation involves either the solution of 
a stochastic partial differential equation for the conditional density 
or an infinite dimensional system of coupled ordinary stochastic dif- 
ferential equations for the conditional moments. Thus, approximations 
must be made for practical implementation. 

The class of linear stochastic systems with linear observations and 
white Gaussian plant and observation noises has a particularly appealing 
structure, because the optimal state estimator consists of a finite 
dimensional linear system (the Kalman-Bucy filter [Kl]), which is easily 
implemented in real time with the aid of a digital computer. Many types 
of finite dimensional suboptimal estimators for general nonlinear systems 
have been proposed [W16], [.II], [LI], [Nl], [S3], [S7]. These are 
primarily based upon linearization and vector space approximations, and 
their performance can be quite sensitive to the particular system under 
consideration. An alternative, but relatively untested, type of sub- 
optimal estimator is based on the use of cumulants [W12], [Nl]. 
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The above considerations lead us to ask two basic questions in the 
search for implement able finite dimensional estimators for nonlinear 
stochastic systems: 

1) If our objective is to design a suboptimal estimator for a 
particular class of nonlinear systems, is it possible to utilize 
the inherent structure of that class of systems in order to 
design a high-performance estimator? 

2) Do there exist subclasses of nonlinear systems whose inherent 
structure leads to finite dimensional optimal estimators (just 
as the structure of linear systems does in that case)? 

Affirmative answers to these questions can lead not only to com- 
putationally feasible estimators, but also to valuable theoretical insight 
into the underlying structure of estimation for general nonlinear systems. 

There is, in fact, a class of nonlinear systems which possesses a 
great deal of structure — the class of bilinear systems. Several re- 
searchers (see Chapter 2) have developed analytical techniques for such 
systems that are as detailed and powerful as those for linear systems. 
Moreover, the mathematical tools which are useful in bilinear system 
analysis include not only the vector space techniques that are so 
valuable in linear system theory, but also many techniques from the 
theories of Lie groups and differential geometry. In addition, the 
recent work of Brockett, Krener, Hirschom, Sedwick, and Lo (see Chapter 
2) has extended many of these analytical techniques to more general non- 
linear systems. Thus, as emphasized previously by Brockett [Bl], [B3j and 
Willsky [W2], it is often advantageous to view the dynamical system of 
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interest in the most natural setting induced by i' s structure, rather 
than to force it into the vector space framework. 

In this thesis we will ^opt a similar point of view with regard to 
stochastic nonlinear systems. We are motivated by the recent work of 
Willsky [W2]-[W6] and Lo [L2]-[L5], who have successfully applied similar 
techniques to some stochastic systems evolving on Lie groups. We will 
investigate the answers to the two basic questions of optimal and sub- 
optimal estimation posed above through the study of stochastic bilinear 
systems and stochastic systems described by certain types of Volterra 
series expansions. Our basic tools are the concepts from the theories 
of Lie groups and Lie algebras and the Volterra series approach of 
Brockett [B25] and Isidori and Ruberti [II], which are so important in 
the deterministic case. In addition, we rely heavily on many results 
from the theories of random processes and stochastic differential 
equations , 

In addition to state estimation, stability of stochastic bilinear 
systems is a problem which has been studied by many researchers in recent 
years (see Chapter 3). Using many of the same Lie-theoretic concepts, we 
will also study the stability of bilinear systems driven by colored noise. 

1. 2 Problem Descriptions 

This research is concerned with the problems of estimation and 
stochastic stability. We first discuss a general nonlinear estimation 
(or filtering) problem [FI], [Jl], [K2 ] . We are given a model in which 
the state evolves according to the vector Ito stochastic differential 
equation 
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dx(t) = f(x(t),t)dt + G(x(t) ,t)dw(t) 


( 1 . 1 ) 


and the observed process is the solution of the vector Ito equation 

1/2 

dz(t) = h(x(t),t)dt + R / (t)dv(t) (1.2) 

1/2 

Here x(t) is an n-vector, z(t) is a p-vector s R is the unique positive 
definite square root of the positive definite matrix R [B13], and v and 
w are independent Brownian motion (Wiener) processes such that 


E[w(t)w' (s)] 



min(t,s) 


0 


Q(T)dr 


(1.3) 


E[v(t)v’(s)3 = min(t,s) * I (1.4) 

We will refer to w as a Wiener process with strength Q(t) . 

The filtering problem is to compute an estimate of the state x(t) 

t A 

given the observations z = {z(s), 0 _< s <_ t}. The optimal estimate with 
respect to a wide variety of criteria [Jl], including the minimum-variance 
(least- squares) criterion 

J = E[ (x(t)-x(t)) (x(t)-x(t)) ' | z t ] (1.5) 

is the conditional mean. 

x(tjt) = E t [x(t)] = E[x(t)|z t ] (1.6) 

Henceforth we will freely interchange the three notations of (1.6) for 
the conditional expectation given the Cf-field cj{z(s), 0 < s <_ t} generated 
by the observation process up to time t. As we will see in Chapter 4, it 
is also useful in certain cases to use a "normalized version" of the con- 
ditional mean. 
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It is well-known [FI], [Jl], [K2] that the conditional mean satisfies 


the Ito equation 


dx(t|t) = E [f (x(t) ,t) ]dt 


+{E t [x(t)h , {x(t),t)]-S(t|t) E t [h'(x(t),t)]}R 1 (t)dv(t) 


(1.7) 


where the innovations process v is defined by 


dv(t) = dz (t) - E [h(x(t) ,t) ]dt 


( 1 . 8 ) 


However, equation (1.7) cannot be implemented in practice, since it is 
not a recursive equation for x(t|t). In fact, the right-hand side of (1.7) 
involves conditional expectations that require in general the entire con- 
ditional density of x(t) for their evaluation. Thus the differential 
equation for the conditional mean x(t|t) depends in general on all the 
other moments of the conditional distribution, so in order to compute 
x(tjt) we would have to solve the infinite set of equations satisfied by 
the conditional moments of x(t). 

If f, G, and h are linear functions of x(t) and x(0) is a Gaussian 
random variable independent of v and w, then x(tjt) can be computed with 
the finite dimensional Kalman-Bucy filter [Kl]» consisting of (1.7) (which 
is linear in this case) and a Riccati equation for the conditional co- 
variance P(t) (which is nonrandom and can be pre-computed off-line). 
Recently, Lo and Willsky [L2] have shown that the filter which computes 
x(t|t) is finite dimensional in the case that (1.1) consists of a bilinear 
system on an abelian Lie group driven by a colored noise process and 
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(1,2) is a linear observation of £(t) (see Chapter 2); also, Willsky 
[W4] extended this result to a slightly larger class of systems. In 
this thesis, we will extend these results to a much larger class of 
systems, described by bilinear equations evolving on solvable and nil- 
potent Lie groups or by certain types of Volterra series expansions. 

In the case that the optimal estimator for x(tjt) is Inherently 
infinite dimensional, one must design suboptimal estimators for practical 
implementation on a digital computer. As mentioned In Section 1.1, many 
researchers have developed suboptimal estimators based upon linearization 
and vector space methods. However, motivated by the successful application 
of Fourier analysis in the design of nonlinear filters (see Willsky [W6] 
and Bucy, et al. [B9]), the work of Ito [13], Grenander [G4], McKean [M7], 
[MS], Yosida [Yl]-[Y3] , and others on random processes on Lie groups, and 
the successful application of Lie-theoretic ideas to deterministic systems, 
we are led to investigate the use of harmonic analysis on Lie groups in 
nonlinear estimator design. The basic idea Is to exploit the Lie group 
structure of certain classes of systems in order to design high-performance 
suboptimal estimators for these systems. 

As with estimation, the problem of the stability of stochastic systems 
has received much attention, and general methods (including Lyapunov 
methods) have been developed. Our approach to stochastic stability will 
be similar to our approach to estimation: we will investigate classes of 
systems (bilinear systems) for which we can use Lie-theoretic concepts 
in order to derive stability criteria. 
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1. 3 Synopsis 


He now present a brief summary of the thesis. In Chapter 2 we review 
some of the important results for deterministic bilinear systems, and we 
discuss stochastic bilinear systems in more detail. Chapter 3 is concerned 
with stochastic stability of bilinear systems , primarily those driven by 
colored noise. Exact stability criteria are presented for bilinear systems 
evolving on solvable Lie groups, and approximate techniques for other 
cases are discussed. In Chapter 4 we present some stochastic bilinear 
models which relate to the problem of the estimation of rotational processes 
in three dimensions; these models serve as one motivation for the estimation 
techniques discussed in Chapters 5 and 6. In Chapter 5 we consider classes 
of systems for which the optimal conditional mean estimator consists of a 
finite dimensional nonlinear system of stochastic differential equations 
(the major results are proved in Appendix D) . We also discuss a class of 
suboptimal estimators which are motivated by these results. In Chapter 6 
we investigate the use of harmonic analysis techniques in the design of 
suboptimal filters for bilinear systems evolving on compact Lie groups 
and homogeneous spaces. 

In Chapter 7 we summarize the results contained In this thesis and 
suggest some possible research directions which are motivated by this 
research. In addition, four appendices are included to supplement the 
discussions presented in the thesis. Appendix A contains a summary of 
the lelevant results from algebra and differential geometry. In Appendix 
B we review the theory of harmonic analysis on compact Lie groups, which 
is used primarily in Chapter 6. Appendix C contains a proof of a version 
of Fubini's theorem which is used in Chapter 5 and Appendix D. Finally, 
Appendix D contains the proofs of the major results in Chapter 5. 
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CHAPTER 2 


BILINEAR SYSTEMS 


2.1 Deterministic Bilinear Systems 

The basic deterministic bilinear equation studied in the literature 
[B1]-[B5] , [Dl] , [HI] ,[I1],[J3] S [M51 , [M6] , [S6] , is 

r n 1 


x(t) 


A + 
o 


X. 

1 " 


u.(t) A. 
l x 


x(t) 


( 2 . 1 ) 


where the A^ are given nxn matrices, the u^ are scalar inputs, and x is 
either an n-vector or an nxn matrix. As discussed in [Bl], the additive 
control model 


x(t) 


B + 
o 


l 


u.(t) 


B. 

l 


x(t) + Cu(t) 


( 2 . 2 ) 


(here u is the vector of the u..) can be reduced to the form (2.1) by 
state augmentation. As the many examples in the above references 
illustrate, bilinear system models occur quite naturally in the con- 
sideration of a variety of physical phenomena. 


The analysis of bilinear systems requires some concepts from the 
theory of Lie groups and Lie algebras. The relevant results are summarised 
in Appendix A. 

Associated with the bilinear system (2.1) are three Lie algebras: 


SS ^ A o»*’*> A n^la 

SB = { a 1 ’«**>VlA C2 ‘ 

% " K * • i=5 °’ 1 ’ — } L A 

o 

Notice that SB C SB C SB', in fact, SB is the ideal in SB generated by 
o o 

{A^, . . . . We also define the corresponding connected Lie groups 
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(2.4) 


G = {exp^} G G q = (exp ^ B = {exp^ G 

Then BC G C G, and G is a normal subgroup of G [J3],[S1]. 
o o 

The relevance of these Lie groups and Lie algebras to the analysis 
of (2.1) is illuminated by first considering the case in which x is an 
nxn matrix. It can easily be shown [J3] that if x(t Q ) e G, then x(t) e G 
for all t < t Q . In other words, the bilinear system evolves on the Lie 
group G. If x is an n-vector, then the solution to (2.1) is given by 


x(t) = X(t)x(0) 

where the transition matrix X(t) satisfies 


(2.5) 


X(t) 


A + 
o 



u.(t) A 
x l 


X(t); 


X(0) = I 


( 2 . 6 ) 


i.e., X(t) evolves on G. Thus the evolution of x(t) is governed by the 
action [Wll] of the Lie group G on x(0), as defined in (2.5)- (2.6), In 
addition, the Lie algebras defined above are intimately related to the 
controllability of (2.1), as described in [Bl] , [HI] , [ J3] . 


One Important aspect of the research done so far by other researchers 

deals with the relationship between bilinear systems and more general 

nonlinear systems. Consider the nonlinear system 

N 

x(t) = a (x(t) ) + a 4 ( x ( c )> u (t); x(0) = x (2.7) 

o XX o 

X=1 


y(t) = c(x(t)) 


( 2 . 8 ) 


where c and a^ i=0,l,...,N are analytic functions of x in some neighbor- 
hood of the free response. Such systems are called linear-analytic . 
Brockett [B25] shows that, under very general conditions, the output of 
a linear-analytic system has a Volterra series expansion (this will be 
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discussed in more detail in Chapter 5) . Isidori and Ruberti [II] have 
derived conditions on the Volterra kernels under which the Volterra series 
is realizable with a finite dimensional bilinear system. 

Krener [K6]-[K9], Hirschom [H1],[H2], and Sedwick [SIX], [S12] have 
developed an alternative approach 3 which we will refer to as the "bi- 
linearization” of nonlinear systems. We require some preliminary 
definitions [Wll] in order to describe this approach. 

Definition 2.1: Let M be a differentiable manifold, the tangent 

space to M at x £ M, and T(M) = U M the tangent bundle of M. A 

x e M X 

smooth (analytic) vector field on an open set U in M is a C (analytic) 


map f: U T(M) such that TTo f — identity map on U, where tr is the 

projection from T(M) onto M. A smooth curve <f> (t) in M is the integral 

o 

curve of f through x q if it is the solution of the differential equation 

x(t) = f(x(t)), x(0) = x . If for every x e M, <J> (t) exists for all 

o o x 

o ^ 

t e R,then f is complete. In this case, if we define f^(x_) = (t), 

o 

the collection {f , t t R} of maps from M to M is called the 1-parameter 
group of f . Each <j> t is an element of diff(M), the group of dif feomorphisms 
from M to M. 

Consider the nonlinear system (2.7) where x e M and are 

analytic vector fields. We define the Lie bracket of two vector fields 
to be the vector field 


[a., a ](x) = a (x) a - a (x)a 
J •*- J J ^ 

If M = R n we identify M with R n for all x E R n and 

9a. 9a. 

[a., a.l(x) - (x)a. (x) - (x)a ; (x) 


where (3a./3x)(x) is the Jacobian matrix of the map a.: R n ■+ R n . The 
Lie algebra generated by a Q ,..,,a^ under this Lie bracket is denoted by 

a? = {a o> V*’ a N } LA 


Krener [K7] shows that if SB is finite dimensional and certain other 
technical conditions are satisfied, then there exists an equivalent 
bilinear system which preserves the solutions of (2.7) locally (i.e., 
for small t) . He also shows that, even if SB is infinite dimensional, 
then (2.7) can be approximated by a bilinear system, with the error 
between the solutions growing proportionately to an arbitrary power of t. 


Hirschom [Hi], [H2], employing an important result of Palais [P3], 
proves a global bilinearization result. Given the system (2.7), where 
x £ M and a , ...,a^ are analytic vector fields, we define 


N 

D {a + y a.a., a. e R} and consider the subset of diff(M) 
o , i i * i 
i=l 

G(D) = {f l o fj o...o f^ ; f 1 £ D, t. £ R, k=l,2,...} 
t 2 k 1 


(2.9) 

where f fc is the 1-parameter group of f . Notice that SB = {a Q , . . . ,a^}^. 

Palais shows that if SB is finite dimensional, then G(D) can be given the 
structure of a connected Lie group G with Lie algebra SB(G) isomorphic to 
SB. If, in addition, G is isomorphic to a matrix Lie group, then there 
exists a bilinear system of the form (2.6) such that the solution x(t) 
of (2.7) is given by 


x(t) = X(t) (x (0) ) 


( 2 . 10 ) 


where X(t) £ G, 
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The action of X(t) on x(0) in (2.10) need not be via matrix-vector 
multiplication; in general, this action can be highly nonlinear. However, 
if M and G are compact, another result of Palais [P5] shows that there 
exists a finite dimensional orthogonal representation D of G (see Appendix 
B) on the space R 111 (for some m) and an imbedding M R™ (see [Wll], 
p. 22) such that 

iKX(t)(x(0)))= D(X(tm(x(0)) (2.11) 

where the action on the right-hand side of (2.11) given by matrix- 
vector multiplication. In this case, (2.7) can be solved by solving the 
bilinear system (2.6) for the orthogonal matrix D(X(t)), performing the 
multiplication in (2.11), and recovering X(t)x(0) via the 1-1 mapping ip. 
The basic idea here is to "lift" the problem onto a Lie transformation 
group acting on M which evolves according to a bilinear system (see [HI], 
[P3], [P4]; this is also related to the recent work of Krener [Kll]). 

These techniques reveal the generality of deterministic bilinear models. 

2 , 2 Stochastic Bil~s near Systems 

Stochastic bilinear systems are described by equations such as (2.1), 
in which the u^ are stochastic processes. Such systems have been con- 
sidered by many authors [B3]- [B7] , [C5 ] , [E2] , [E3] , [13] , [ J2] , [K3]- [K5] , 
[L2-L5] , [M8] , [M14] , [S4] , [S5] , [W1]-[W7] . In considering stochastic 
versions of (2.1), one must be careful to use the appropriate stochastic 
calculus. For instance, if u(t) is a vector zero-mean white noise with 
E[u(t)u’ (s) ] = Q(t) 6 (t-s) 

then the Ito stochastic differential analogue of (2.1) is 
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Q. . (t) A. A. ]dt + 
13 1-3 


1 N 

dx(t) = { [a q + ^ 

i,j=l 



A,dw. (t) } x(t) 
1 x 


( 2 . 12 ) 


where Q is the (i,j)th element of R and w is the integral of u; i.e., 

w is a Brownian motion (Wiener) process with strength Q(t) such that 

.min(t,s) 


E[w(t)w’ (s) ] « 


-f 


Q (T ) d.T 


(2.13) 


Equation (2.12) can be derived in two ways; first, if x is an nxn 
matrix, (2.12) can be viewed as a generalization of McKean's injection 
of a Brownian motion into a matrix Lie group [M7] , [Wl] , [L4] . Equation 
(2,12) can also be obtained from (2.1) by adding the appropriate Wong- 
Zakai correction term [W9],[W10], which in this case is 


I l 


Q. . (t)A.A. 
1-3 3-3 


x(t)dt 


(2.14) 


The addition of this correction ter-i, which transforms the Stratonovich 
equation into the corresponding Ito equation, ensures that (in the case 
that x is an nxn matrix) x(t) will evolve on G = (expi^K in the mean- 
square sense and almost surely [L4] , [LSI , [M7] . 


Associated with the Ito equation (2.12) is a sequence of equations 
for the moments of the state x(t), first derived by Brockett [B3],[B4]. 
We will assume first that x is an n-vector satisfying (2.12). Recall 
that the number of linearly independent homogeneous polynomials of 
degree p in n variables (i.e., f (cx^, . . . , cx^) = c^f (x^ , . . . ,x^) ) is given 

by 


N(n,p) 


'n + p 


(n+p-1) ! 
(n-1) I p I 


(2.15) 


P 
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We choose a basis for this N(n,p) - dimensional space of homogeneous 


polynomials in consisting of the elements 



(2.16) 


If we denote the vector consisting of these elements (ordered lexico- 
graphically) by then x^ P ^ is a symmetric tensor of degree p [H5], 

and 




(2.17) 


where j | x[ | = /x’x. It is clear that if x satisfies the linear dif- 
ferential equation 


x(t) = Ax(t) 


[p] 

then x satisfies a linear differential equation 

[p] 


x Cp] (t) =* A [p] x 


(t) 


(2.18) 


(2.19) 


The matrix can be easily computed from A (see Blankenship [B26]), 

and in fact is a linear function of A (so that (aA+B) , , = aA, .+B r ,)• 

[pl [pl [pl 

For an interpretation of A p as an infinitesimal linear operator on 

symmetric tensors of degree p, see [B3],[G6]; A r , is also related to the 

LPJ 

concept of Kronecker sum matrices [B13]. We note only that the eigenvalues 
of are all possible sums of p (not necessarily distinct) eigenvalues 


of A, 

Brockett has shown that if x satisfies (2.12), then x LrJ satisfies 
the I to equation 


.[Pi 
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dx^ p3 (t) = {A + 


N 

*1 


N 


Q..A. A. }x* p3 (t)dt+ 


°[p] 2 i.j" 1 ^ i [pl ^[p] 


^ ^i f _ i xir J (t)dw_. (t) 


. [p 1 


i-1 ™ 

( 2 . 20 ) 

Taking expected values, we get the linear p utl moment equation 

N 

3 ( 2 . 21 ) 


^Etx^Ct)] . U + 

[p] 


A A } E[x [p] (t) 
i,j=l [pl J tpl 


Moment equations can also be derived for the case of an nxn matrix X 


satisfying (2.12). We denote by A^ p3 the matrix which verifies 
y = Ax=>y tp] = A [P] x [p3 
Some properties of the matrix A^ p3 are given in [B3]. A^ p3 can be 


( 2 . 22 ) 


interpreted as a linear operator on symmetric tensors of degree p [B3], 

[G6], and is known as the symmetrized Kronecker p 1 "^ 1 power of A [M16]. In 

fact, A r . is the infinitesimal version of A^ p3 . 

[p] 

F n 1 

If X satisfies (2,12), it is easy to show that X LP also satisfies 
(2.20) and (2.21). The analysis of the moment equations for x and X is 
useful in studies of both estimation and stochastic stability for the 
bilinear equation (2.12), because the infinite sequence of moments contains 
precisely the same information as the probability distribution of x or X 
(if the moments are bounded and the series of moments converges absolutely 
[P4, p. 157]). 

Another case of considerable importance arises if u in (2.1) is a 
colored noise generated by a finite dimensional linear stochastic 
differential equation 


d £ (t) = F(t) £ (t) dt + G(t) dw(t) + a(t) dt 
u(t) = H(t) £ (t) 


(2.23) 

(2.24) 
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where a, F, G, and H are known and w is a standard Wiener process (i.e.. 


a Wiener process with strength I). In this case, there is no correction 
terra added to (2.1), because u is "smoother" than white noise. As in the 
deterministic case, x evolves on the Lie group G. Notice that x by itself 
is not a Markov process, but the augmented process y = (x,£) is. The 
equation for y is then described by (2.1), (2.23), (2.24); it obviously 
involves products of the state variables x and £. Thus it does not 
satisfy the Lipschitz and growth conditions usually assumed in proving 
the existence and uniqueness of solutions to Ito stochastic differential 
equations [J1],[W8], However, Martin [Ml] has proved the existence and 
uniqueness (in the mean-square sense) of solutions to (2.1) driven by a 
scalar colored noise; the extension to the vector case is straightforward. 

In Chapters 4-6, we will consider the estimation of processes 
described by stochastic bilinear equations of the types just discussed. 

We now briefly describe the types of measurement processes that will be 
considered. 

One very important measurement process consists of linear measure- 
ments corrupted by additive noise 

dz(t) = L(x(t), £(t>) dt + dv(t) (2.25) 

where L is a linear operator (recall x is either an n-vector or an nxn 
matrix and £ is a vector) and v Is a Wiener process. The important 
implications of linear measurements for bilinear systems will be discussed 
at length in Chapters 5 and 6. In addition, the bilinear system-linear 
observation model of (2.12), (2.25) is general enough to include a model 
with the bilinear system (2.12) and observations 
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dy(t) = 7L (x(t)» x(t),...,x(t))dt + R 1/2 (t) dv(t) (2.26) 

P 

where L is a p-linear map. In this case, we can (following Brockett [B2] 
P 

in the deterministic case) define the augmented state vector 


x = (x T , x 


[2 


7 I * 


,* W V 


(2.27) 


which will again satisfy a bilinear equation (see (2.20)). However, the 
observation equation is now linear in the state x. 

A second observation model is the "multiplicative noise case" 


Z(t) - X(t) V(t) • (2.28) 

in which Z, X, and V are all nxn matrices. Examples of physical systems 
which can be modeled as bilinear systems with observations described by 
(2.25) or (2.28) will be discussed in Chapter 4, However, the development 
of estimation techniques in Chapters 5 and 6 will be limited to the linear 
observation processes (and their generalizMtions, as discussed above). In 
Chapter 5, we will derive finite dimensional estimators for certain classes 
of bilinear systems driven by colored noise. 

As an example of the type of estimation problem we will consider in 
Chapter 6, suppose the n-vector x satisfies the stochastic bilinear 
equation (2.12) with Q(t) = I, and the linear observations are of the 
form 

dz(t) = H(t)x(t)dt + dv(t) (2.29) 

where v is a Wiener process of strength R(t). Then the nonlinear 
filtering equation (1.7) and the moment equation (2.20) yield 


N 

dE t [x [p] (t)] = [A + “ A A A ]E fc [x [p] (t)]dt 
°£p] 2 1 IP] J [P1 

+ {E t [x [pl (t)x , <t)]-E t [x [pI (t)]E t [ X r (t)]}H , (t)R" 1 Ct)dv(t) 


where the innovations process is given by 
dv(t) = dz(t) - H(t)E fc [x{t) ]dt 


( 2 . 30 ) 


( 2 . 31 ) 


The filter which computes x(t|t) is obviously infinite-dimensional in 

t r p i 

general, since the equation for E [x LPJ (t)] is coupled to the equation 
f |>KL1 

for E [x * (t)]. The design of suboptimal filters for the case in 

which x evolves on a compact Lie group or homogeneous space will be 


discussed in Chapter 6 


CHAPTER 3 


STABILITY OF STOCHASTIC BILINEAR SYSTEMS 
3.1 Introduction 

The stability of stochastic bilinear systems has been investigated 
recently by Brockett [B3], [B4]> Willems [W21]-[W23], Blankenship [B6], 
[B7], [B26], and Martin [Ml] (Martin's thesis also contains a good 
summary of previous work on this subject). Many definitions of stochastic 
stability are used by these authors, but we will consider only the 
following definition for bilinear systems with white noise (equation 
(2.12)) or colored noise (equation (2.1)), in which x is an n-vector. 

Definition 3.1 : A vector random process x is order stable if 
E[x^(t)] is bounded for all t, and x is order asymptotically 
stable if 

lim E[x fpl (t)] = 0 (3.1) 

t-> “ 

The bilinear systems (2.1) and (2.12) are p 1 -* 1 order (asymptotically) 
stable if the solution x is order (asymptotically) stable for all 
initial conditions x(0) independent of the u^(in (2.1)) or the v^ 

(in (2.12)) and such that E[x^(0)J < “. 

We first consider the white noise case (2.12). Since the 
moment equation (2.21) is linear, the usual stability results for linear 
systems [B8], [C2] immediately yield the following theorem. 

Theorem 3.1 ; The system (2.12) with R(t) = I is order 
asymptotically stable if and only if the matrix 
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(3.2) 


D 

P 



I 

■4 =1 


CA i } 

[P] 


2 


has all its eigenvalues in the left half plane (Re A < 0) * The system is 
p tl1 order stable if all the eigenvalues of D have negative or zero real 

p 

parts, and if A is an eigenvalue with Re (A) = 0, then A is a simple zero 
of the minimal polynomial of D . 


The explicit computation of the eigenvalues of in terms of 
A q , A^,...,A^ is an unsolved problem in the general case. However, 

Brockett [B4] has shown that if A q , A^,...,A^ are all skew-symmetric 
and (2.1) is controllable on the sphere S n \ then the solution of 
(2.12) is such that all moments approach the moments associated with 
the uniform distribution on S as t approaches infinity. He has also 
shown [B3] that in the scalar case (n = N = 1) it is not possible for 
(2.12) to be p c ^ order stable for all p (assuming that A^ ^ 0). 

Willems [W22] has derived explicit necessary and sufficient conditions 

4r 

in terms of the eigenvalues of A q , A^,...,A^ for the order asymptotic 
stability of (2.12) in the case that SB = {A^A^, . . . i s solvable (see 

Section A. 3). However, this has not been accomplished in the general 
case (or, for example, if 3 ? is semisimple). 

In the next section, we present a procedure for obtaining necessary 
and sufficient conditions for order (asymptotic) stability of the 
system (2.1) driven by colored noise, for the special case in which SB 
is solvable. In Section 3.3 we discuss some approximate techniques 
for deriving sufficient conditions for stability in the case that SB 
is not solvable. 
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3. 2 B ilinear Systems with Colored Noise— The Solvable Case 

In this section we analyze the stability of the bilinear system 
(2.1) driven by a colored noise process u. Assume that x is an n-vector 
and u is a Gaussian random process independent of x(0) with 

E[u(t) 1 = m(t) (3. 3) 

E[ (u(t)-m(t)) (u(s)-m(s)) T ] = P(t,s) (3.4) 

The purpose of this section is to show that necessary and sufficient 
conditions for p^ 1 order (asymptotic) stability can be derived if 
SB = {A q , A 1 ,... s A n } la is solvable. We first outline one general procedure 
for determining these conditions, and then present several examples to 
illustrate the method. 

As noted in Chapter 2, we can write the solution to (2.1) in terms 
of the transition matrix X via (2.5)-(2.6), If SB is solvable, we can 
derive a closed-form expression for X in terms of u. The first work on 
the derivation of closed-form expressions for the solution of (2.1) in 
the solvable case was done by Wei and Norman [W14], [W15]. Martin [Ml] 
used their results to calculate stochastic stability conditions in the 
solvable case. Our alternate, but computationally equivalent, approach 
proceeds as follows. 

First we make use of Lemma A. 1, x?hich proves the existence of a 
(possibly complex-valued) nonsingular matrix P such that = PA.jP ^ 
is in upper triangular form for i = 0, 1,..., N. Then the equation 

N 

Y(t) - [B o + ^ B.u i (t)]Y(t); Y(0) = I (3.5) 

i=l 
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can be solved in closed-form by quadrature. Consequently 

X(t) = P -1 Y(t)P (3.6) 

and X involves only exponentials and polynomials in the integrals of the 
components of u (see Example 3.3). Since u is Gaussian and independent of 
x(0), the expectations of the components of X can be evaluated in closed 
form (see the examples). Hence 

E[x(t)] = E[X(t)] E[x(0)] (3.7) 

can be evaluated in closed form, and we can determine necessary and 
sufficient conditions for first order stability. 

In order to determine conditions for p 11 ^ 1 order stability, we con- 
sider the equation for 


ft xIpl(t) - 


N 

A 0 + 

L tP] ± tl 


2 V'k 


(t) 


* [p] (t) 


(3.8) 


Let 


^[pl { V_, J A l r _, V_,'LA 


'[p] ■ t [p] 


( Cp] 


(3.9) 


Since [B3] 


[A » B] [p] ~ [A [p]’ B [pl ] 


(3.10) 


we see that solvable if and only if ££ is. Therefore, we can use 

the preceding analysis to determine first order stability conditions for 
(3.8) (i.e., pth order stability conditions for the original system 
( 2 . 1 )). 


Example 3.1 [Ml], [B8, p. 58] ; Consider the scalar system 
x(t) = (a + u(t) )x(t) 


(3.11) 
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(3.12) 


where a is constant and u is a Gaussian random process with 


E[u(t) ] = 0 


E[u(t)u(t+x) ] = a 2 e a l T l 


where a > 0. The solution to (3.11) is 
x (t ) = e at+nCt) x(0) 


n(t) - 


■/» 


(s)ds 


Recall [Mil] that the characteristic function of a Gaussian random 

vector y with mean m and covariance P is given by 

1 

( x or iu'y iu'm- i u r Pu > 

My (u) = E[e J ] = e < 

Since n in (3.14) is Gaussian, we can use (3.15) to compute 

1 ? 2 

E[x(t) ] = E[x(0) ]exp {at + — a t + ^ (e a -1) } < 

a 

Hence (3.11) is first order asymptotically stable if and only if 


a < - 


(notice that this requires a < 0) . Since 


^7 x P (t) = (pa + pu(t))x P (t) (3.1 

we have that (3.11) is pb* 1 order asymptotically stable if and only if 
2 

a < - 2£_ (3.3 

a 

2 

Also, a = -pa /a implies p 11 * 1 order stability. 


Example 3.2 [W23], [M12] : Consider the n-dimensional system (2.1), 
where u is a Gaussian random process with statistics (3. 3) -(3. 4), and 
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assume that SB is abelian. Then the solution of (2.1) is 

N r t 

x(t) = exp(Agt){ 77” expfA^ / u. (s)ds] x(0)} (3.20) 

1=1 0 1 

As in the previous example, the statistics of x are completely determined 
by those of the integral of the noise process u, and explicit stability 
criteria can be derived. 

For example consider the system 

x(t) = Ax(t) + u(t)x(t) (3.21) 

where A is a given nxn matrix and u is the same as in the preceding 
example. It can be shown [M12], [W23] that (3.21) is p fc h order 
asymptotically stable if and only If 

Re(A i ) < - po 2 /a 

for all eigenvalues of A. For a more complete discussion of the 
abelian case, see Willems [W23]. 

Example 3.3 : Consider the system 


x(t) « 


r 3 

I 

~i=l 


1 A i u i (t) 


x(t) 


where 



' 0 0" 


“1 -1 


I— •-> 

1 0 

£5 


A„ = 


A„ = 



.- 1 \ 

2 

.1 

3 

— i 
G 


(3.22) 


and u is a stationary Gaussian random process with statistics 


E[u(t) ] = m = [m^, m 2 , 


(3.23) 
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E[ (u(t)-m) (u(s)-m) T ] = P(t,s) = P(t-s) 


(3.24) 


It is easy to verify that SS is solvable and 


1 0 

Ll -lj 

triangularises £&. Thus if y = Px s then 


(3.25) 


and 


exp 




u 2 (t)" 




) = 



y(t) ‘ 

(3.26) 


0 

u x (t)_ 




r.t 




t T 

- 

/ u 3 (s)ds 

exp 

/ u 3 (s)ds + I u 1 (s)ds 

U 2 (T)dT 

o 


*u 


J jf 

T 0 


L 

- 



. 



y(t) = 


exp 


= Y(t)y(0) 


J u^sjds 


y(0) 


(3.27) 


The expectations of the quantities in (3.27) can be evaluated by means 
of the characteristic function (3.15). Some simple calculations yield 


E[Y u (t)] = exp[ V + | f f *33 (O l-°2 )d0 2 d0 l 1 


0 0 


(3.28) 


1 / / 

E[Y 22 (t)] = exp [m^t + | | f 

*'0 4 ) 


P U (0 1^2’ d<, 2 d0 l’ 


(3.29) 


ft 1 

E [Y i2 (t)] ~ I (m 2 + 3(s)) exp[m 3 (t-T) + m^r + — y(s)]ds 


(3.30) 
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P 21^ S_C 0 dCF 


(3.31) 


B(s) = 


f 


P 23 (s-a)da 


4- 


c 

I 


t s 

Y (s) = 2 C J* P 31 (a ;L -o 2 )da 2 da 1 + 
s •'o 


n s 

p ii <o r°2 )d0 2 d0 i 

u u 


/ / p 33 <0 rV d V°i 


(3.32) 


Then conditions for asymptotic stability can be determined from the 
closed-form expression 

E[x(t) ] = P _1 E[Y(t)]P E[x(0)] (3.33) 

For example, suppose that u^, u^, and are independent with 
E[u^(t)] = m, i = 1,2,3, and 

E[u i (t)u i (t+T)] = aj exp [-ct i | t j ] , a ± > 0 i = 1,2,3 

(3.34) 

In this case the system (3.22) is first order asymptotically stable if 
and only if 

2 2 

m < -max(a^/a^, tf-j/c^) (3.35) 

Other examples of this technique are discussed in [M12]. 


3 . 3 Bilinear Syst: iw with Colored Noise — The General Case 

If the Lie algebra SB is not solvable, then (2.1) cannot be solved 
in closed form, and the approach of the previous section is not applicable. 
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In this section we discuss some approximate methods fur deriving 
stability conditions. 

Blankenship I B26 1 has used some results from stochastic averaging 
theory to derive conditions for the stability of the "slowly varying" 
portion of the moments of (2.1) in the case that the noise u(t) is 
bounded and satisfies some other technical conditions. However, the 
boundedness assumption excludes Gaussian noise processes. 

One procedure for deriving sufficient conditions for the p £ h 
order stability (p even) of a general bilinear system driven by 
Gaussian noise is based on a method of Brockett [B27], Assume that 
x(t) satisfies 

x(t) = [A + Bu(t)]x(t) (3.36) 

where u is a Gaussian process satisfying (3.12). We use a simple in- 
equality [B8, p. 128] to show that 

^-(x 1 (t)x(t)) = x’ (t)[A+A* + u(t) (B + B r )]x(t) 


<U (A + A ') + u<t)X Ct)x(t) 

~ max max 

(3.37) 


where X (P) denotes the maximum eigenvalue of P. Hence 
max ° 

x T (t)x(t) 5 y 2 (t) 

where y is a scalar process satisfying 


* ct) - 1 iW A + A ’> + + 

y(o) - [x' (0)x(0)] i/2 


(3.38) 


(3.39) 
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or, equivalent, we have 


x’ (t)x(t) < n(t) 

where H is a scalar process satisfying 


(3.40) 


r\ (t) = [X (A + A r ) + u(t)X (B + B r ) ]n(t) 
max max 

n(o) = x f (o)x(o) 


(3.41) 


The condition of Example 3.1 then states that (3.41) is p fc ^ order 
asymptotically stable (which implies that (3.36) is (2p)-th order 
asymptotically stable) if 


X 

max 


(A + A') < - 


pa 2 U. 


max 


(B + B')] 2 /a 


(3.42) 


The stability condition (3.42) could have been derived from (3.37) by 
a direct application of the Gronwall-Bellman inequality [B8, p. 19]. 
However, the present formulation suggests generalizations in a certain 
direction which will be discussed at the end of this section. 

The following examples indicate that this procedure, while pro- 
viding useful stability criteria in some cases, often provides little 
cr no information about the stability of (3.36). This is to be expected, 
because we have essentially bounded the process x in (3.36) by a scalar 
process, thus neglecting many of the important characteristics of x. 


Example 3.4; Let B = I and 



Then a simple computation shows that 


X (A + A’ ) = -3 + /5, and the 
max 
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criterion (3.42) implies that (3.36) is (2p)-th order asymptotically 
stable if 


2 

g_ 

a 



« - (.191) 
P 


(3.43) 


Since SB is abelian, Example 3.2 gives the necessary and sufficient 
condition for asymptotic stability: 



(3.44) 


Thus (3.42) provides a sufficient condition which is, however, con- 
servative (i.e., (3.42) provides a smaller region of stability than 
Example 3.2). 


Example 3.5 : Let A be arbitrary and let 


B = 


-1 


2 


L° -U 

Then A (E + B’) = 0, and the condition (3.42) implies (2p)-th order 
nisx 

asymptotic stability of (3.36) if 


A 

max 


(A + A' ) 


< 0 


(3.45) 


Notice that this result is independent of the noise statistics. 


Example 3.6 : Let B = I and 


A = 



Since A (A + A') = 0, the condition (3.42) yields no information 
max J 


about asymptotic stability. However, we know from Example 3.2 that 
a necessary and sufficient condition for order asymptotic stability 
of (3.36) is 


— — < — • (3.46) 

a p 

Example 3.7 : Consider the damped harmonic oscillator, in which 



' 0 1 ‘ 


o 

o 

A = 


B = 



.-l -K. 


.1 oj 


where r > 0. We again have X (A + A') =0, so (3.42) provides no 

max 

information about stochastic stability. 

The damped harmonic oscillator of Example 3.7, for which the 
general criterion (3.42) is not useful, has been considered by Martin 
[Ml] from a different point of view. Martin investigated the second 
order (mean-square) asymptotic stability of only the first component 
x^ (the position). He expanded the solution x^(t) in a Volterra series 
and bounded this series term-by-term with the solution of a scalar 
equation, thus obtaining sufficient conditions for the mean-square 
asymptotic stability of Xj. He then optimized over the parameters 
of the scalar system in order to obtain the largest region of stability. 

Both of these methods basically consist of bounding x' (t)x(t) 

2 2 
(or x 1 (t)), where x is the solution of (3.36), by y (t) (where y is 

the solution of a scalar system). The results of Example 3.1 then 

provide a sufficient condition for (2p)-th order asymptotic stability. 

However, the techniques of Section 3.2 enable us to compute necessary 
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and sufficient stability conditions for systems more general than scalar 
systems — namely, systems ir which SB is solvable. It thus seems 
reasonable to conjecture that better stability conditions (i.e., 
larger regions of asymptotic stability) can be derived by bounding 
x 1 (t)x(t) (where x is the solution of (3.36)) by y’(Oy(t) (where y is 
the solution of 

y (t) = (A + Bu(t))y(t) (3.47) 

and A and B are upper triangular) . We have attempted to generalize 
to the solvable case both of the above methods of bounding, but our 
efforts have been unsuccessful to date. 
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CHAPTER 4 


MOTIVATION: ESTIMATION OF ROTATIONAL 
PROCESSES IN THREE DIMENSIONS 


4.1 Introduction 

Many practical estimation problems can be analyzed in the frame- 
work of bilinear systems evolving on Lie groups or homogeneous spaces. 
For example, several communications problems (such as the phase 
tracking example of Chapter 6) can be viewed as bilinear systems 
evolving on the circle , [G2] , [L2] , [M9] , [W3] , [W6] , [W7] , [ W12 ] . 

As we shall see in subsequent chapters, the fact that S is an abelian 
Lie group (l.e., rotations in one dimension commute) provides an 
important simplification. In this chapter we formulate several 
problems of practical importance involving rotations in three 
dimensions (we will rely substantially on the discussion in [W12]). 

These problems are considerably more difficult than those in one 
dimension, since rotations in three-space do not commute [M8] , [S4] , [W2] . 

In this chapter, x*e will make several approximations in order to 
develop models for several physical systems. These approximations 
are often justifiable. However, we use these models primarily to 
find useful filter structures for such problems. As we will show in 
Chapters 5 and 6, these models do lead to novel and useful filters. 

The problem of estimating the angular velocity and orientation 
(or attitude) of a rigid body has been studied by many authors [B4], 
[BIO] , [B18] , [L6 ] , [L8] , [M10] , [S4] , [S5] , [W2] , [W13] . In general the 
optimal estimator (or filter) is infinite dimensional, so practical 
estimation techniques for these problems are inherently suboptimal. 
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One structural feature of the rigid body orientation-angular velocity 
problem which is very important is that the space of possible orienta- 
tions defines a Lie group [W2] , [S4] , [S5] , and the combined orientation- 
angular velocity space is the tangent bundle of the orientation space 
and is thus a homogeneous space [Bll]; in fact, it can he given a Lie 
group structure isomorphic to the Euclidean group in three-space [M4], 
There are also Lie-theoretic interpretations of four of the most widely 
used representations of the attitude of a rigid body — direction 
cosines, unit quaternions, Euler angles, and Cayley-Klein parameters. 

We will exploit this Lie group structure in our consideration of the 
estimation problem. 

We will consider only the direction cosine and quaternion 
descriptions; the other representations are discussed in [W2] and [S5]. 

4. 2 Attitude Estimation with Direction Cosines 

The orientation of a rigid body can be described by the matrix of 
direction cosines [W17],[E4] between two sets of orthogonal axes — 
one rotating with the body (b-frame) and the other an inertial 
reference frame (i-frarae) . The direction cosine matrix is a 3x3 
orthogonal matrix (X’X-I) with detX = +1. The set of all such matrices 
form the matrix Lie group SO (3) [Bl] , [SI] , [W2] (see also Appendices A 

D 

and B) . Let CT denote the direction cosine matrix of the g-f rame with 
respect to the a-frame. If the 3-vector £j(t) is the angular velocity 
of the body with respect to inertial space in body coordinates, the 
evolution of the orientation of the body is described by the bilinear 
equation 
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A 1 



form a basis for so (3), the matrix Lie algebra associated with SO (3). 

The fact that SO (3) is a simple Lie group (see Appendix A) complicates 
the study of dynamics on S0(3), because this implies that there is no 
global closed-form solution to (4.1). Wei and Norman [W14],[W15] have 
shown the existence of local expressions for the solution to equations 
of the form (2.1); however, these solutions are global only in certain 
cases. We will exploit this fact for the case of solvable Lie groups 
in order to obtain finite dimensional optimal nonlinear estimators in 
the next section. We also note that the local Wei-Norman representa- 
tion of the solution of (4.1) corresponds to an Euler angle description, 
which is well known to exist only locally (see [W17], where this fact 
is related to the phenomenon of "ginibal-lock 1 ') . 

We assume that the angular velocity in (4.2) is a stochastic 
process satisfying 

d£(t) = f(t)dt + A(t) K (t)dt + G(t) dw(t) (4.3) 

where f and G are known, E, (0) is normally distributed, and w is a 
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standard Wiener process independent of £(0). Here f is a vector of 
known torques acting on the body, and the Brownian motion term 
represents random disturbances. The angular velocity equation (4.3) is 
simpler than the usual nonlinear Euler equations; this approximation 
is reasonable in some cases (see [W12]). 

We will consider three different measurement processes — one 
motivated by a strapdown inertial navigation system, one by an inertial 
system in which a platform is to be kept inertially fixed, and one by a 
star tracker. In a strapdown system [W17], one receives noisy informa- 
tion about either angular velocity or incremental angle changes. 
Assuming that the size of the increment is small, either type of 
information can be modeled (see [W12]) by the Ito equation 

dz (t) = C(t)£(t)dt + S 1/2 (t)dv(t) (4.4) 

where S=S’ > 0 and v is a standard Wiener process, independent of 

A second type of observation process is suggested by an inertial 
system equipped with a platform that is to "instrument" (i.e., remain 
fixed with respect to) the inertial reference frame. We must consider 
the direction cosines relating the body-fixed frame (b-frame), platform 
frame (p-f rame) , and inertial reference frame (i-frame) . Recall that 

X(t) - C^(t) (4.5) 

Also, by noting the relative orientation of the platform and the body 
(perhaps by reading of gimbal angles tWl7] ) ,we can measure 
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(4.6) 


c b (t) 

D 


cj(t) 


(4.7) 


represent the noise due to platform misalignment. We model the gyro 
drifts and other inaccuracies which cause platform misalignment by the 
equations 


n (t) = E (t) + v (t ) 
p p p 


(4.8) 


n b (t) = c b (t) + v b (t) 


(4.9) 


where rip and denote the angular velocity of the b-frame with respect 

to the p-frame in p and b coordinates , respectively; and £ b denote 

the angular velocity of the b-frame with respect to the i-frame in p 

and b’ coordinates, respectively; and v and v, denote the error in the 

P b 

measurement (the angular velocity of the i-frame with respect to the 
p-frame) in p and b coordinates , respectively. The error process v 
will be modeled as a Brownian motion process with strength S(t). 

We now derive an equation for the platform misalignment V(t) (this 
derivation is due to Willsky [W20]). For ease of notation, the derivation 
will be performed using Stratonovich calculus ( <J will denote the 
Stratonovich differential). The matrix M(t) satisfies 


dM(t) = [^ b (t)dt + cf v b (t) ]M(t) 


(4.10) 


4 - 


(4.11) 


*~z 


R.a. 


i 1 


Since [E4, p.119] 

n. (t) = M(t)n (t)M f (t) (4.12) 

b p 

we have 

dM(t) = M(t)[L(t)dt +^v (t)J (4.13) 

P P 

Since our measurement consists of 

M(t) = X(t)V(t) (4.14) 


the platform misalignment satisfies 


V(t) = X' (t)M(t) 


(4.15) 


and 

dV(t) = {-X 1 Ct)i b (t)M(t)dt+X' (t)M(t) [£ (t)dt + <f v (t)]M f (t)M(t)> 
= {-X T (t)? b (t)M(t)dt+X’ (t)^ b (t)M(t) dt+X' (t)M(t)ctv p (t)}dt 
= V(t)^v p (t) (4.16) 


or, in Xto form, 


dV(t) = V(t) 


-Z 


l.dv.(t)+ | 


7 S . . (t)R.R,dt 

ni-i 13 1J 


(4.17) 


and V is a left-invariant S0(3) Brownian motion (see Sectior 6.3 and 
[L8] , [M8] , [W2 ] ) . 
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The third measurement process is motivated by the use of a star 
tracker [F2] , [F3] , [14] , [PI] , [Rl] . In a star tracker, the star chosen 
as a reference has associated with it a known unit position vector a 
in inertial coordinates, pointing from the origin of the inertial frame 
along the line of sight to the star. The vector a must he transformed 
to take into account the position and velocity of the body; thus a will 
be time-varying if the body is in motion (for example, if we are 
estimating the attitude of a satellite in orbit), A second type of 
time dependence in a arises because different stars (with different 
position vectors) can be used for sightings. As in [F2], the 
measurement consists of noisy observations of the unit position 
vector of the star in body coordinates (that is, observations of 
C^(t)a(t) plus white noise). We model such observations via the Ito 
equation 

dz(t) = X(t)a(t)dt + S 1/2 (t)dv(t) (4.18) 

where S=S T > 0 and v is a standard Wiener procsss. 

For all three measurement processes associated with the state 
equations (4.1) and (4.3), the problem of interest is that of estimating 
X(t) and £(t) given the past observations; z = {z(s), CKs<t} if we use 
(4.4) or (4.18), or = {M(s), 0<s<t} if our observations satisfy (4.14). 
We will consider an estimation criterion of the constrained least-squares 
type; i.e. , we wish to find the estimate (X(t| t) ,£(t| t)) that minimizes 
the conditional error covariance 

j - E[a(t)-?(t|t))'(c(t)-?(t|t)) 

+tr{(X(t)-X(t|t)) , (X(t)-X(t|t))}|y t ] (4.19) 
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subject to the constraint 


X(t|t) T X(t|t) = I (4.20) 

t t t 

Here y denotes either z or M , depending on which observation process 
we are considering. It is well-known [B16 ] , [B21] , [C4] that the optimal 
estimate for the criterion (4.19)-(4.20) is given by 

?(t|t) = El^(t) |y t ] (4.21) 

X(t | t) * X(t|t)[X(tjt) 'XCtlt)]" 1 ^ 2 (4.22) 


Notice that both of the observation processes (4.4) and (4.18) are 
linear in the augmented state (X(t), £(t)). The implications of linear 
measurements for bilinear systems will be explored in Chapters 5 and 6 
with regard to estimation problems. 


4. 3 Attitude Estimation with Quaternions 

A second way of characterizing the attitude of a rotating rigid 

body is by a quaternion. The unit quaternions Q are defined by 

4 

Q = (q - q-L + q 2 i + q 3 i + q 4 M2 = (4.23) 

i=l 


where the group multiplication on Q is defined by the relations 


.2 

i 




= -1 


jk = -kj = i 


ij = -ji = k 
ki = -ik = j 


(4.24) 


We note that there is a Lie group isomorphism [Wll] between Q and the 
unit 3-sphere 

.. 3 A f s _4 j 2,2,2 2 -i 

b = {(x 1 ,x ?s x 3 ,x 4 )eR j ^ + x 2 + x 3 + = 1} (4.25) 
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where we identify 


(x l5 x 2 ,x 3 ,x 4 ) < » x 1 + x 2 i + x 3 j + x 4 k (4.26) 

A vector x e R can be represented as a quaternion with = 0: 

x = x^i + Xgj + x 3 k (4.27) 


If the quaternion q represents the orientation of the |3-frame with 
respect to the a-frame, then the vector x is transformed from 
a-coordinates to ^-coordinates by 


x D = q x q ¥ 


where the conjugate of q is defined by 


(4.28) 


q* 



q i " q 2 i " q 3^ “ q 4 k 


(4.29) 


Comparing (4.28) to the equivalent expression in terms of direction 
cosines 


x 


8 



CL 


(4.30) 


we see that there is a Lie group homomorphism g : Q -> SO (3) given by 


gCq-L + q 2 i + q 3^ 

+ q 4 k ) = 


”2.2 2 2 
q l + q 2 “ q 3 “ q 4 

2(q 2 q 3 - q x q 4 ) 

2(q 2 q 4 + q x q 3 ) 

2(q 2 q 3 + q lV 

2 2,2 2 

q l ~ q 2 + q 3 q 4 

2(q 3 q 4 - q^p 

2(q 2 q 4 - q^) 

2(q 3 q 4 + ’lV 

2 2 2 , 

q x - q 2 - q 3 + 


(4.31) 



Notice that 


g(q) = g(“q) V q E Q (4.32) 

3 

In fact, one can show that Q “ S is the simply connected covering 
3 

group [Wll] of SO , and we have the Lie group isomorphism 

S0(3) Q/{l} (4.33) 

where {1} is the subgroup of Q containing those two elements. 


If £(t) is the angular velocity of a rigid body with respect to 
inertial space in body coordinates and q is the quaternion representing 
the orientation of the body frame with respect to inertial space, then 
the orientation equation corresponding to (4.1) is 




? i (t)R.(t) 


q(t) 


(4.34) 


where the vector corresponding to the quaternion q is q - (q^jq^jq^sq^) ’ 
and the given by 


"l 

-1 

0 

0 


"o 

0 

-1 

cT 


~0 

0 

0 

-f 

1 

0 

0 

0 


0 

0 

0 

1 


0 

0 

-1 

0 

0 

0 

0 

-1 

*2 

1 

0 

0 

0 

r 3 

0 

1 

0 

0 

_0 

0 

1 

0_ 


_o 

-1 

0 

0_ 


JL 

0 

0 

0_ 


(4.35) 

form the basis of a Lie algebra isomorphic to so(3). If q(0) is a 
unit quaternion (i.e., q T (0)q(0) =1), then q'(t)q(t) = 1 for all t. 
Thus q evolves on the quaternion group for all t, or equivalently, 
q” evolves on S^. 

Thus, one can consider attitude estimation problems by using the 
quaternion equation (4.34), the angular velocity equation (4.3), and an 
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appropriate measurement equation. In the case of the strapdown 
navigation system of Section 4.2, equation (4.4) is again the ap- 
propriate measurement. For the star tracker, the measurement 
corresponding to (4.18) is 

dz (t) = g(q(t) ) a(t) + S l/2 (t)dv(t) (4.36) 

where g(q) is defined in (4.31). Notice that this measurement is 
quadratic in q. As we remarked in Chapter 2, the bilinear system 

(4.34) with quadratic measurement (4.36) can be transformed Into a 
bilinear system with a linear measurement by augmenting the state of 

(4.34) . 

We can again use a constrained least-squares estimation criterion 
for the system (4.3), (4.34) evolving on Q, with measurements z given 
by (4.4) or (4.36) (see also [B15] and [G3]). In this case, we wish 
to find the estimate (q(t|t), £(t|t)) that minimizes 


J = E[(£(t) - ?(t|t))'(£(t) - C(t|t)> 

+ (q(t) - q(t|t))'(q(t) - qCt[ t) ) | z ] (4.37) 

subject to the constraint 

| [ q (t 1 1 ) ( | 2 = q (t j t) 1 q(t 1 1) = 1 (4.38) 

The optimal estimate is then given by 

£(t|t) = ?(tjt) (4.39) 

- n^| r <*•«> 

where again denotes conditional expectation. 
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4.4 Satellite Tracking 


A simplified satellite tracking problem can also be analyzed in 
the framework of bilinear systems. Consider a satellite in circular 
orbit about some celestial body. Because of a variety of effects in- 
cluding anomalies in the gravitational field of the body, effects of 
the gravitational fields of nearby bodies, and the effects of solar 
pressure, the orbit of the satellite is perturbed. In this case, the 
position x of the satellite can be described by the simplified 
bilinear model [W12] 


dx(t) = 


i=l 


f ± Ct)E. + 2 


i, j=l 


Q. . (t)R.R. 
3-J 1 J 


dt 


+ 



R i dw ± Ct))x(t) 


(4.41) 


where f^ are the components of the nominal angular velocity and w^ are 

the components of a Wiener process with strength Q(t). If E[x' (0) x(0) ]=1, 

2 

then E[x T (t)x(t) )*=1 for all t; thus x evolves on the 2~sphere S (the 
same statement can be made almost surely [L8]). We note that the 
assumption in (4.41) that the perturbations in the angular velocity 
are white is a simplification. For example, the anomalies in the 
gravitational field of the celestial body are spatially correlated and 
constitute a random field [P2], [W8]. However, the simplified model 
(4.41) leads to simple but accurate on-line tracking schemes (see 
Chapter 6). 

If we are then given noisy observations of the satellite position 


dz(t) = H(t) x(t)dt + 


s 1 / 2 (t) 


dv(t) 


(4.42) 
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where v is a standard Wiener process and S(t) = S'(t) > 0, our problem 
is to estimate x(t) given {z(s), 0 £ s £ t}, and we can again use a 
constrained least-squares criterion. Notice that this problem is also 
of the bilinear system-linear observation type described in Chapter 2. 

Consideration of the many practical problems described in this 
chapter, in addition to the theoretical questions posed in Chapter 1, 
has led to the study of estimation problems for similar bilinear 
models. These will be discussed in the next two chapters. 


CHAPTER 5 


FINITE DIMENSIONAL OPTIMAL NONLINEAR ESTIMATORS 
5. I Introduction 

In this chapter we will exploit the structure of particular classes 
of systems in order to prove that the optimal estimators for these 
systems are finite dimensional. The general class of systems is given 
by a linear Gauss-Markov process £ which feeds forward into a nonlinear 
system with state x. Our goal is to estimate £ and x given noisy linear 
observations of Specifically, consider the system 


d£(t) = F (t)£ (t)dt + G (t)dw(t) 

(5.1) 

A 

dx(t) = a (x(t))dt + V a i (x(t))? i (t)dt 

(5.2) 

1=1 


dz(t) = H(t)€(t)dt + R 1/2 (t)dv(t) 

(5.3) 


where £(t) is an n-vector, x(t) is a k-vector, z(t) is a p-vector, w and 
v are independent standard Brownian motion processes, R > 0, £(0) is a 
Gaussian random variable independent of w and v, x(0) is independent of 
£(0), w, and v, and {a^, i=0,...,N} are analytic functions of x. Also, 
we define Q(t) = G(t)G' (t). It will be assumed (for technical reasons 
which will become evident later in this chapter) that [F(t), G(t), H(t)] 
is completely controllable and observable [B8]. 

As shown by Brockett [B25] in the deterministic case, considerable 
insight can be gained by considering the Volterra series expansion of 
the linear-analytic system (5.2). The Volterra series expansion for 
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che component of x is given by 


x, (t) 

X 






i ki <<*]_) ••• ^k. (a j )do i da j 


(5.4) 


(k^ > • » • »k. ) 

where Che i th order kernel w. . J i 

J J 1 


is a locally bounded, piecewise 


continuous function. We will consider, without loss of generality, only 

(k^, . . . ,k. ) 

J (t,a, , . . . ,a.) = 0 if 
i j 


triangular kernels which satisfy w_. ^ 

> G m ; 2,,m *= 1,2,3,.... We say that a kernel w(t, a^,...,G_.) is 
separable if it can be expressed as a finite sum 


m 

w(t, a.) = Tq (t) (cr-^ y 2 (c r 2 )---Yj (°j) (5.5) 


i=l 


Brockett [B25] discusses the convergence of (5.4) in the deter- 
ministic case, but we will not consider this question in the general 
stochastic case. We will be more concerned with the case in which the 
linear-analytic system (5.2) has a finite Volterra series — that is, the 
expansion (5.4) has a finite number of terms. Brockett shows that a 
finite Volterra series has a bilinear realization if and only if the 
kernels are separable. Hence, a proof similar to that of Martin [Ml] 
of the existence and uniqueness of solutions to a bilinear system driven 
by the Gauss-Markov process (5.1) implies that a finite Volterra series 
in £ with separable kernels is well-defined in the mean-square sense. 


■54 



As discussed in Chapter 1, our objective is the computation of the 
conditional means £(t|t) and x(t(t). The computation of |(t|t) can be 
performed by the finite dimensional (linear) Kalman-Bucy filter; moreover, 
the conditional density of £(t) given z is Gaussian with mean f(tjt) and 
nonrandom covariance P(t) [Jl], [Kl], As discussed in Chapter 1, the 
computation of x(t|t) requires in general an infinite dimensional system 
of equations; it is not computed as one might naively guess, merely by 
substituting £(t|t) into (5.2) in place of £(t) and solving that equation. 
We shall prove that x(tjt) can be computed with a finite di»n"sional 
nonlinear estimator if the component of the solution to (5.2) can 


be expressed in the form 


x.(t) - e 


£-00 
J nOO 


(5.6) 


where is the j 1 -* 1 component of 5 (for some j) and ri is a finite 
Volterra series in £ with separable kernels. 

It is easy to show, using Brockett's results on finite Volterra 
series, that each term in (5.6) can be realized by a bilinear system 
of the form 


n 

x(t) = £..(t)x(t) + ^ A k (t)£ k (t)x(t) 


(5.7) 


k=i 


where the A_. are strictly upper triangular (zero on and below the 
diagonal). For such systems, the Lie algebra^ is nilpotent (see 
(2.3)). In Section 5.3, we shall show conversely that if (5.2) is a 
bilinear system with nilpotent, its solution can be written as a 
finite sum of terms given by (5.6); hence, such systems also have 
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finite dimensional optimal estimators. These results thus generalize 
the results of Lo and Willsky [L2] (for the abelian case) and Willsky 
[W4]. The abelian discrete- time problem is also considered by Johnson 
and Stear [J2]. 

In Section 5.2 we state the major theorems concerning finite 
dimensional estimators for systems described by Volterra series and 
we give an example. Section 5.3 contains the corresponding results 
for bilinear systems. In Section 5.4, suboptimal estimators are con- 
structed for some classes of systems to which the previous results do 
not apply. 

5.2 A Class of Finite Dimensional Optimal Nonlinear Estimators 

The first tvio theorems state finite dimensional estimation results 
for certain classes of nonlinear systems. The proofs are contained in 
this section and in Appendix D; an example follows. 

Theorem 5.1 : Consider the linear system described by (5.1), (5.3), 

and define the scalar-valued process 
£.(t) 

x(t) = e 3 n(t) (5.8) 

where n is a finite Volterra series in E, x^ith separable kernels. Then 
ri (t 1 1 ) and x(t j t) can be computed with a finite dimensional system of 
nonlinear stochastic differential equations driven by the innovations 
dv(t) = dz(t) - H(t)£(t|t)dt. 

Theorem 5.2 : Consider the linear system (5.1), (5.3), and define 
the scalar-valued processes 
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nCt) = 


jf t-t 1 W-w 


D "0 "0 


1 1 


Y-.(cr,) ... v.(a.)d o- ... da. 
'i i 'a j 1 j 


(5.9) 


x(t) = e * n (t) (5.10) 

where {y^} are deterministic functions of time and i > j. Then n(t|t) 
and S(tjt) can be computed with a finite dimensional system of non- 
linear stochastic equations driven by the innovations. 

The distinction between Theorems 5.1 and 5.2 lies in the fact that 
i > j in (5.9) — i.e., there are more ^ T s than integrals. On the other 
hand, each term in the finite Volterra series in (5.8) has i=j and the 
a are distinct. As Brockett {£25] remarks, we can consider (5.9) as 
a single term in a Volterra series if we allow the kernel to contain 
impulse functions. As we will show in Lemma D.2, a term (5.9) with 
i < j (more integrals than can f> e rewritten as a Volterra term 

with i= j , so Theorem 5.1 also applies in this case. 


Proof of Theorem 5.1 : We consider one term in the finite Volterra 
series; since the kernels are separable, we can assume without loss of 
generality that this term has the form 


-t -a. 




n(t) = 


/ / 1 •/ J 1 V ai> '" 5 kivvv---vv d0 i--- d ° 3 


0 0 


(5.11) 


The theorem is proved by induction on j , the order of the Volterra term 
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(5.11). We now give the proof for j=l; the proof by induction is given 
in Appendix D. 


If j=l, then 

J - 

n(t) = j (°i) da i (5.12) 

■'o 1 

and n(t) is linear function of Hence, if the state £ of (5.1) is 
augmented with n, the resulting system is also linear. Then the Kalman- 
Bucy filter for the system described by (5.1), (5.3), (5.12) generates 
£(t|t) and f](t|t). In order to prove that x(t[t) is "finite dimension- 
ally computable" (FDC) , we need the following lemma. First we define, 
for cr^, a 2 t, the conditional cross-covariance matrix 

P(a 1 ,o 2 ,t) - E[ (£ (a x ) - g(a 1 | t)) CCCo 2 ) - Ua 2 \ t)) 1 1 z t ] (5.13) 

(where 5(o|t) = E[£(cr) | 2 t ]) . 


Lemma 5.1 : The joint conditional density ^ 

Gaussian with nonrandom conditional cross-covariance P(o^, 
Pto r <! 2 ,t) is independent of {z(s), 0 <_ s t}. 


(v ,v 1 1 z fc ) :'.s 

* t- ) *““ 3 . . G * 


£ 

Proof : First ? the conditional density is Gaussian because £ and 

+* 

z are jointly Gaussian random processes. Assume then 


p £<a 1 ),CCa 2 ) Cv -' , ’l ztl 


= p ;(0i ,(v|£(o 2 ) - 

(5.14) 

= P£<o 1 ) t '’l ?<0 2 ) ' v ’> z a 2 )p £( a2 ) (v 'l zt) 

(5-15) 
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where z*" = {z (s) , <_ s £ t} . 

°Z 

Here (5.14) follows by the definition of the conditional density, and 

(5.15) is due to the Markov property of the process (£, z) [Jl]. Each 
of the densities in (5.15) is the result of a linear smoothing operation; 
hence, each is Gaussian with nonrandom covariance P i (t) and 

a ll CT 2 

P(a 2 J<?2 J res P e ctively [L10], Also, fora > 0, [K12], [G8] 

-1 -1 -1 

P(a,0, t) = [P (a) + P B (a)] where P Q is the error covariance of 
a Kalman filter running backward in time from t to a, and P_^(t) - 0. 

■D 

Due to the controllability of [F, GJ , P(a) is invertible for all a > 0 
and Pg Ccr) is invertible for all a < t [W18]; consequently, P(a,c, t) 
is invertible for all 0 < 0 t. By the formula for the conditional 
covariance of a Gaussian distribution [Jl], we have for 0 cj^ < 


'<7, 0, 

± i 


(t) = P(a I , a v t) - P(a n , a 0 , t) P 1 (0 O , a„ t)P r (a lS a„ t) 


l 5 2 


2 ’ r- 


I s 2’ 


(5.16) 


Since P(a^, 0 2 » c )» 0 _< < t, can be computed from (5.16), it is 

also nonrandom; and since we have shown previously that P(0, 0, t) is 
nonrandom, P(o^, CT^, t) is nonrandom for all 0 <_ o^, a 2 <_ t. M 


This lemma allows the off-line computation of P ( a i ,cr 2 * via t ^ ie 
equations of Kwakemaak [Kll] (for <_ a ) 

P(a 1 ,a 2 ,t) = p(cf 1 ) f , (a 2 ,a 1 ) 


- P(a 1 ) j* r (x.a^ h’(t)r 1 


(t) H(t) 1 f(T,0 2 )dT 


P(0 2 ) 


(5.17) 
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“ nt,T) = [F(t) - P (t)H' (t)R 1 (t)H (t) ] V (t »t) - I 

(5.18) 

where the Kalman filter error covariance matrix P(t) - P(t, t, t) is 
computed via the Riccati equation 

P(t) = F(t) P(t) + P(t) F'(t) + Q(t) - P(t)H* (t)R _1 (t)H(t)P (t) 
P(0) = P Q (5.19) 


Recall that the characteristic function of a Gaussian random vector 
y with mean m and covariance P is given by 

M (u) = E[exp(iu'y)] = exp[iu r m- y u'Pu] (5.20) 

y ^ 

Hence, by taking partial derivatives of the characteristic function 
(see Lemma D.l), we have 


E 


t f t Ct) 

[x(t)] y^cr) E c [e 3 


f Y 1 (a)[? k (o|t) + P k ^(a,t,t)]e J 

J n 1 1 




da 


■ jf Yi<°> P ki>j to,t,t)d5 + E^f Yl (a)C ki Ca)do J 



- , J ^ (C|t) + 2 

P k . (a,t,t)da + n(t|t)J e J 


(5.21) 
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Since the first terra in (5.21) is nonrandom and q(t|t) aud £j(tjt) can 
be computed with a Kalman-Bucy filter, x(t|t) is indeed FDC for the 
case j=l. 

The induction step of the proof of Theorem 5.1 is given in Appendix 
D. A crucial component of the proof is Lemma D.l, which expresses higher 
order moments of a Gaussian distribution in terms of the lower moments. 
Notice that in equation (5.21) we have interchanged the operations of 
integration and conditional expectation. This is justified by the 
version of the Fob ini theorem proved in Appendix C; since we will be 
dealing only with integrals of products of Gaussian random processes, 
the use of the Fubini theorem is easily justified, and we will use it 
without further comment. 

The proof of Theorem 5.2 is almost identical to that of Theorem 
5.1; the differences are explained in Appendix D. We now present an 
example to illustrate the basic concepts of these theorems; this 
example is a special case of Theorem 5.2. However, we will need one 
preliminary lemma . 

Lemma 5.2 : The conditional cross-covariance satisfies 

P(cr, t, t) = K(t,o)P(t) (5.22) 

where 

|^K'(t,a) = ~[F'(t) + P _1 (t) Q (t) ] K*(t,a); K.' ( 0 ,a) = I 

(5.23) 
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Proof: Let 


P(a,t) ^ E[(c(a) - £(a|a))(£(t) - C(t!tj)*] 

and consider 

P(a,t,t) - P(a,t) = E[CC(o|o) - £(a|t))<S(t) - C(t|t)) f U fc ] 

Since £(dja) - 4(cr|t) is measurable with respect to the cr-field a (z ), 
the projection theorem [R3] implies that P(a,t',t) - P(a,t) = 0. The proof 
is concluded by noting that [IC12] 

P(a,t) = K(t,a)P(t) ® 


Example 5.1 : Consider the system described by 


r “i 


p 


- 


“ 

d£ x (t) 

=3 

-a 0 



dt + 

dw^(t) 

d£ 2 (t) 


_ 0 -B. 


h (t) - 


dw 2 (t) 


dx(t) = (-yx(t) + C 1 (t)^ 2 (t))dt 


(5.25) 


dz 1 (t) 



dt + 

dv^(t) 

dz 2 (t) 


h lt K 


_ dv 2 (t) _ 


(5.26) 


where a, 3,1 > 0, w^, w^, v^, and v 2 are independent, zero mean, unit 
variance Wiener processes, £^(0) and £ 2 (0) are i nt ^ e P en< ^ ent Gaussian 
random variables which are also independent of the noise processes, 
and x(0) = 0 (see Figure 5.1). 

The conditional expectation x(t|t) satisfies the nonlinear filtering 
equation (Z .7)— (1.8) : 
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Figure 5.1 Block Diagram of the System of Example 5.1 


i 

; 

I 
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dx(t|t) = E t [-yx(t) + ^(t) £ 2 (t)]dt 
£ 

+{E t [J e" YCt “ s) e 1 (s)C 2 ( s 5 ds * £*(t)] 


f e _YCt " s) C 1 {s)^ 2 Cs)ds]? T (t|t)} dV(t) 

•7\ 


(5.27) 


where £ft) = [£^(t), £ 2 (t)]’and the innovations process v is given by 

dv(t) = dz (t)~| (t < t)dt (5.28) 

£ 

Recall that the conditional covariance P (t) of £(t) given z 
satisfies the Riccati equation (5.19). Since £^(0) and £ 2 (0) are 
independent, it is not difficult to show that P^ 2 (t) = ^21^^ ~ 0 for 
all t. From (5. 22) -(5. 2 3) we can compute 


P(o,t,t) = 


— 1 

P 1 ;L (t) exp[a(t-c) - j P 11 (s)ds] 


o 


.-l, 


P 22 (t)exp[B(t-a) -J P,,^ (s)ds] 


22 
"a 

(5.29) 


These facts and equation (D.3a) imply that the transpose of the gain 
term in (5.27) is 

t t 

E fc [ f e" YCt " s) ^ 1 (s)e 2 (s)?(t)ds]-E t [ f e" Y(t “ s) 5 1 (s)? 2 ( s )ds]t(t|t) 
“0 


0 


/ 


-Y(t-s) 


(E C [? 1 (3)C 2 Cs)?(t)]-K t [l 1 (s)C 2 ( S )]E t (5(t)])ds 


- E i r 


-y(t-s) 


11 ’ 


P 22 (s,t,t) 



\(s)” 

\ 



ds 5 


C 2 (s) 

) 


(5.30a) 
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n i ( e)p^i ( t) 


T1 2 < ' t ^ P 22^ t ^ 


(5. '"'lb) 




n 2 (t) 


a-Y-PjJ(t) 


B-Y-P 22 ‘(t) |n 2 (t) 


Lu) 


o l %Ct> 


i o e 2 (t) 


(5.31) 


n 1 Co> = n 2 (0) = o 

In other words , the argument of the conditional expectation in (5.30a) 
can be realised as the output of a finite dimensional linear system 
with state n(t) D [ri^(t), r^Ct)] 1 satisfying (5.31). 

Thus the finite dimensional optimal estimator for the system (5.24)- 
(5.26) is constructed as follows (see Figure 5.2). First we augment the 
state £ of (5.24) with the state n of (5.31). Then the Kalraan-Bucy filter 
for the linear system (5.24), (5.31), with observations (5.26), computes 
the conditional expectations £(t|t) and n(t|t). Finally, 


dx(t 1 1) = [-"Yx(t|t) + £^(t | t)? 2 (t| t) ]dt + O’ (t ! t)P(t)dv(t) 
x(0| 0) = 0 


(5.32) 


We now discuss the steady-state behavior of the optimal filter. 
Since the linear system (5.24) is asymptotically stable (and hence 
detectable) and controllable, the Riccati equation (5.19) has a unique 
positive-definite steady state solution P [W18] ; a simple computation 
shows that 


(5.33) 


-a+ /a +1 


-3+ 0+1 
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Thus, in steady-state, the augmented linear system (5.24), (5.31) is 
time-invariant. Now consider the eigenvalues of (5.31) in steady-state: 

-i n r 2 

a-y-P^ = a-y-(-a+Ja +1) = -y-«ja +1 

$-y-P:i = B“Y-(-B+Je 2 +l) 1 “ -Y- ^ 3^1 

Consequently, the augmented linear system is also asymptotically stable 

and controllable in steady-state. Let the conditional covariance matrix 

£ 

of the augmented state [£(t), ri (t) ] given z be denoted by S(t). Then 
the Riccati equation satisfied by S(t) has a unique positive-definite 
steady-state solution S (notice that S^=P.^ and ^22 =P 22^ * 


The steady-state Kalman-Bucy filter [Jl] for the augmented system 


(5.24), 

(5 . 31) 

is 

easily 

computed to 

be 

- 


- 

- 


dq(t|t) 


-a 

0 

0 

0 

q(t|t) 


P 

11 

0 


d| 2 (t [ t) 


0 

-3 

0 • 
r ~? — 1 

0 

? 2 (t|t) 

dt + 

0 

P 22 


dv 1 (t) 

dn^(t | t) 


0 

1 -y 

-J a+i 

0 

n L (t|t) 


0 

S 23 


d\> 2 (t) 

dn 2 (t| t) 


1 

0 

° -Y-J 

/ 3 2 +i 



- s u 

0 



(5.34) 


where 


P P 
11 22 


14 


P ll P 22 +(a - B+ ^ P 22 +1 


P P 
11 22 


23 


p n P 22 +(B “ a+T)p ii +1 


(here P-^ and P 22 are defined in (5.33,)). The conditional expectation 
x(t| t) is computed according to 


dSc(tJt) = [-yx(tjt) + C 1 (t|t)C 2 (t|t)]dt + n' (t| t)Pdv(t) 

x(0|0) - 0 (5.35) 
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which is a nonlinear, time-invariant equation. The steady-state optimal 
filter is illustrated in Figure 5.3. 

We note that the stability of the original Linear system is not 
necessary for the existence of the steady state optimal filter in this 
example; in fact, a weaker sufficient condition is the detectability 
[W18] of the linear system (5.24), (5.26) and the positivity of y in 
(5.25). The generalisation of this result to other systems is presently 
being investigated. 

The basic technique in Example 5.1 and in the .proof of Theorems 
5.1 and 5.2 is the augmentation of the state of the original system with 
the processes which are required in the nonlinear filtering equation. 

For the classes of systems considered here, we prove that only a finite 
number of additional states are required. An alternate interpretation 
is that we need only compute a finite number of the smoothed statistics 
of £. 


5 . 3 Finite Dimensional Estimators for Bilinear Systems 

In this section we will use the results of the previous section 
and some results from Lie theory to prove that the optimal estimators 
for certain bilinear systems are finite dimensional. We note here that 
as early as 1965, Kalman [K10] conjectured: "It might be that algebraic 
methods, reminiscent of the way in which Lie groups are used to study 
nonlinear differential equations, will give us the first explicit, 
nontrivial, nonlinear filters," The results of this section will show 
chat Kalman was, in a sense, correct. 
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Figure 5.3 Block Diagram of the Steady-State Optimal Filter For Example 5.1 


x> 





(5.36) 


Consider the system described by (5.1), (5.3), and 

N 

X(t) - (A q + ^ ? i (t)A.)X(t); X(0) = I 

i-1 


where X(t) is a kxk matrix. We will explicitly use the structure of the 
Lie algebra SB = {A^, . . . ,A^}^ and the ideal SB^ in SB generated by 
{A 1 ,...,A n > in the determination of finite dimensional estimators. 


Theorem 5.3 ; Consider the system described by (5.1), (5.3), and 
(5.36), and assume that SB^ is nilpotent (see Appendix A). Then the 

A ■ 

conditional expectation X(t|t) can be computed with a finite dimensional 
system of nonlinear stochastic differential equations driven by the 
innovations dv(t) = dz(t) - H(t)| (t | t)dt . 

It can easily be shown that if SB^ is nilpotent, then SB is solvable; 
however, the converse is not true. Hence, SB is always solvable in 
Theorem 5.3. 


Notice that the model considered in Theorem 5.3 is the same as the 
strapdown navigation model of Section 4.2. However, in the navigation 
model SB= SO (3) is not solvable (in fact, it is' simple), so Theorem 5.3 
does not apply. Suboptimal estimation techniques which can be applied 
to the navigation problem are discussed in Section 5.4. 

Theorem 5.3 is proved via a series of lemmas which reduce the 
estimation problem to the case in which SB is a particular nilpotent 
Lie algebra. The first lemma generalizes a result of Willsky [W4] , 
Brockett [Bl], and Krener [K6] (the proof is analogous and will be 
omitted) . 
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Lemma 5.3 : Consider the system (5.1), (5.3), (5.36), and define 


the kxk matrix- valued process 

-AqL 

Y (t) = e X(t) 

Then there exists a deterministic matrix-valued function D(t) such that 
Y satisfies 


Y (t) = 


M 


y H.y^t) 
Li=l 


Y (t) ; Y (0) = I 


(5.37) 


where {H^, ...,11^} is a basis f or ^ and 
y(t) = D(t)£(t) 

A 

In addition, X can be computed according to 
X(t|t) = e ° Y(t 1 1) 


(5.38) 


(5.39) 


Lemma 5.3 enables us, without loss of generality, to examine the 
estimation problem for Y(t) evolving on the normal subgroup G^ = {exp^]^, 
rather than for X(t) evolving on the full Lie group G = {exp<S?}g. Thus 
we need only consider the case in which = 0 and ££- SS^ is nilpotent 
in order to prove Theorem 5.3. 

By means of Lemma A. 2, the problem can be further reduced to the 
consideration of Lie algebras in nilpotent canonical form (see equation 
(A. 6)). 

Lemma 5.4 : Consider the system (5.1), (5,3), (5.36), where = 0 
and SB is nilpotent. Then there exists a (possibly complex-valued) non- 
singular matrix P such that 
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(5.40) 


X (t 1 1) = p” 1 ? (t | t)P 
where Y satisfies (5.37) and are in nilpotent canonical 

form. 


Proof : According to Lemma A. 2, there exists a nonsingular matrix P 
-1 

such that P SB P is in nilpotent canonical form. If we define 
H i = PA i P "*”» then X(t) = P ^"Y(t)P, where Y satisfies (5.37). Hence 
X(t|t) = PY(t|t)P ^ and the lemma is proved. ■ 


Finally, by means of the following trivial lemma, we reduce the 
problem to the consideration of one block in the nilpotent canonical 
form. 


Lemma 5.5 : Consider the system (5.1), (5.3), (5.36), where A^ = 0 
and {A 1 ,...,A n } are in nilpotent canonical form. Then X(t) has a block 
diagonal form conformable with that of {A^,...,A^}. 


Let gn(ra) denote the Lie algebra of upper triangular mxm matrices 
with equal diagonal elements. Then Lemma 5.5 implies that the bilinear 
system (5,36) can be viewed as the "direct sum" of a number of decoupled 
k^ -dimensional subsystems 


X J (t) = 


N 

L s 

L i=l 


e i (t)A l 


X^(t); X^(0) = I 


where A^,,..,A^ belong to gn(k^). Hence Theorem 5.3 will be established 
when we prove the following lemma. 
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Lemma 5.6 : Consider the system (5.1), (5.3), (5.36), where Aq = 0 
and {A^,...,A^} e gn(k). Then each element of the solution X(t) of 
(5.36) can be expressed in the form 


exp 


^(sjds^ n(t) 


(5.41) 


where n is a finite Volterra series in £ with separable kernels. Hence, 
Theorem 5.1 implies that X(tjt) can be computed with a finite dimensional 
system of nonlinear stochastic differential equations. 


Proof : Since {A^,...,A^} e gn(k) , the bilinear equation (5.36) can 
be rewritten in the form 


X(t) = 


~ N N 

(I “i 5 i (t) ) 1 + 2 

3i-l ' 1=1 


£. (t)B. 


X(t) 


(5.42) 


where are constants, I denotes the kxk identity matrix, and B^, ...,B^ 
are strictly upper triangular (zero on the diagonal) . It is easy to 
show that 

/ N t 

X(t) - exp V a ± ^ i (s)ds)Y(t) 

\i=l *0 


where Y satisfies 
Y(t) = 


N 

i=l 


Y(t); Y (0) = I 


(5.43) 


Since the {B^} are strictly upper triangular, the solution of (5.43) can 
be written as a finite Peano-Baker (Volterra) series [B25], and each 
element of X(t) can be expressed in the form (5.41). H 
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Theorem 5.3 can be generalized to include certain time-varying 
bilinear systems; the proof is identical. 

Theorem 5.4 : Consider the system described by (5.1), (5.3) and 

N 

X(t) = [A Q (t) + C.(t)A.]X(t); X(0) = I (5.44) 

i=l 

Let SB = {A^, . . . ,A^, A Q (t) and iet SB^ be the ideal in SB generated 

by {A^,...^}. Assume that is nilpotent. Then X(t|t) can be computed 
with a finite dimensional system of nonlinear stochastic differential 
equations. 

We note that if A^(t) is time-varying, the nilpotency of 9 ^ does 
not imply that 9 is solvable. Hence, in contrast to Theorem 5.3, X(t) 
in Theorem 5.4 need not evolve on a solvable Lie group. 

The following example illustrates b'.w Hirschom's bilinearization 
technique (see Chapter 2) can be used to place the series interconnection 
of two bilinear systems in the framework of Theorem 5.3. 

Example 5.2 : Consider the series interconnection of two bilinear 
systems described by [H2] 

it^t) = [A Q + ? 1 (t)A 1 + ? 2 (t)A 2 ]x 1 (t) ; x 1 (0) = x 1Q (5.45) 

^ 2 (t) = [B 0 + Cx 1 (t)B 1 3x 2 (t) ; x 2 (0) - x 20 (5.46) 

where eR^, x 2 eR^, C = [0,1,1], and 
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1 

1 
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0 


0 

0 
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0 

1 
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11 

0 

0 

1 
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0 

0 

0 


0 

0 



0 

0 

0 


0 

0 



p 






_ 





0 

1 

0 


0 

0 

0 





0 

0 

0 

B l = 

0 

0 

1 





0 

0 

0 


0 

0 

0 





(5.47) 


Hirschorn shows that the system (5.45)-(5.46) can be bilinearized — i.e,, 
there exists an 8x8 matrix bilinear system 

X(t) = [F Q + ? 1 (t)F 1 + ? 2 (t)F 2 lX(t); X(0) = I (5.48) 

such that y (t) = [x^(t), x^OO] 1 is given by 


y (t) - [i 6 0]X(t) 


(5.49) 


y<0) 

y 3 (o)y 6 (o) 
y 2 (o)y 6 (o) 

In addition, SB- {F q , F , is nilpatent. Thus in this case the 

bilinearization of (5. 45) -(5. 46) is accomplished merely by augmenting 
the state y; the augmented system is in fact bilinear. If the initial 
state y(0) is assumed to be independent of £^(t) and £ 2 (t) ^ or t ’ 
then 


y(tjt) = [I, 0 ]X(t J t) 


E[y (0) ] 
F[y 3 (0)y 6 (0)] 

L E[y 2 (0)y 6 (0)]^ 


(5.50) 


Since SB is nilpotent. Theorem 5.3 implies that X(tjt), and hence y(tjt), 
are computable with a finite dimensional filter. 
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This is a very simple example, the results of which could also have 
been obtained by solving (5.45)-(5.46) explicitly and applying Theorem 
5.1. In general, one must be careful in applying techniques such as 
bilinearization to estimation problems. Notice that the action (5.49) 
of X(t) on y(0) is linear in X(t) ; if it had been nonlinear (as is the 
case for a general bilinearization problem [H2]), the method would not 
have worked. Also, recall from Section 2.1 that the Lie group G (D) 
associated with the nonlinear system (see equation (2.9)) may not have 
a matrix representation; in such cases, the procedure of Example 3.2 
cannot be used. 

5.4 General Linear-Analytic Systems — Suboptimal Estimators 

In this section we present an example to demonstrate that the 
results of the previous sections cannot be generalized to much larger 
classes of systems; in fact, we will show that Theorem 5.3 cannot even 
be generalized to the case in which SS is solvable, but^ 0 is not nil- 
potent. We will then present a suboptimal estimation procedure for 
linear-analytic systems driven by colored noise. 


Example 5.3 : Consider the estimation of X with observations z, as 
described in (5.1), (5.3), (5.36), in which ^ is the most elementary 
non-nilpotent Lie algebra. That is, let n = N = 3, k = 2, A Q = 0, and 


A l = 


1 0 
0 0 


A„ = 


0 1 
0 0 


A 3 = 


0 0 
0 1 


(5.51) 


The solution of (5.36) can then be expressed in closed form as 
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X(t) = 


y-^t) 


L o 


f ^ 


e n(Tlt) dT 


y 3 Ct) 


where 


n(x,t) -J £^(0)30 + f? 3 (a)da 

x J o 


and 


’i (t) - f q 

•Tl 


. (a) da 


0 

Using the characteristic function (5.20), we see that 

Jt -t 


X 11 ( C I fc ) = exp[y 1 (t!t) + 4 


2 f f p 11 Co 1J a 2S t)da 2 da 1 ] 
*0 J o 


.t t 


X 22 ^ fc * = ex P[y 3 Ct[t) + J f f P 33^ 1 >o 2 ,t)da 2 da 1 ] 

"b J 0 

Thus the only difficulty is in the computation of X- 2 (t[t). 

n(x,t) 


\ 2 (t|t) = f E t [5 2 (T)e n(Tst) ]dT 

J 0 


and, by Lemma D.l, 


'[£ 2 (x)e r| *‘ T5t ^ ] = [ ^ <t 1 1) +a(x,t)]’ 


• exp[J* ^(ajOda + j f 3 (o|t)do + 6 (t. 


where 


ct(x,t) = J P 12 (a,x,t)da + J P 32 (a,x,t)da 


(5.52) 

(5.53) 

(5.54) 

(5.55) 

(5.56) 

But 

(5.57) 

t)] 

(5.58) 

(5.59) 
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(5.60) 


6(t ,t) 



P ll^ c 'l ,cf 2’ t ' )da 2 da i + 2 


T T 



P(a 1 ,o 2 ,t)da 2 da 1 


+ 


f f P 13 (a i ,a 2 
T J Q 


,t)da 2 da 1 


After substituting (5.58) into (5.57), it is clear that X^ 2 (t|t) cannot 
be computed with a finite dimensional system of equations; i.e. , we must 
compute an infinite number of smoothed functionals of Thus even the 
least complicated non-nilpotent case does not fit into the framework of 
the previous sections. 


For the system of Example 5.3 and for other nonlinear systems which 
require infinite dimensional optimal estimators, implementable suboptimal 
estimators must be designed. A general suboptimal estimation procedure 
is suggested by the finite dimensional estimators developed in this 
chapter. Consider the system (5.1) -(5. 3), and assume that the linear- 
analytic system (5.2) admits a Volterra series representation. Brockett 
[B25] shows that the Volterra kernels of a linear-analytic realization 
are necessarily separable. It is clear from Theorem 5.1 that a finite 
dimensional suboptimal estimator for x(tjt) can be constructed by 
truncating the Volterra series after a finite number of terms and 
computing the conditional expectation of the resulting finite Volterra 
series. Notice, however, that the dimension of the estimator increases 
rapidly with the number of terms retained. 

As an example of this procedure, consider the strapdown inertial 
navigation system of Section 4.2, as described by (4.1), (4.3), and 


- 78 - 


and (4.4). Since (4.1) evolves on the simple Lie group S0(3), which is 


not even solvable, the computation of X(t|t) requires an infinite 
dimensional estimator. The Volterra expansion of (4.1) is given by 
the Peano-Baker series [B8] 

X(t) - I - f R. f 5 1 Ca 1 )da 1 
i=l J 0 


& 


+ 




5.(a 1 )5 j (a 2 )da 2 d 0l - ... 


( 5 . 61 ) 


A suboptimal filter for the constrained least-squares estimate (see 
Section 4.2) is obtained by truncating the series after N terms and 
computing the conditional expectation X(t|t) of this finite series; 

X(t| t) is an approximation to the true conditional expectation X(t|t). 
The finite dimensional approximation to the constrained least-squares 
estimate is (see (4.22)) 

X(t j t) = X(t|t)[X(t|t)'X(t(t)]" l/2 (5.62) 

An analogous suboptimal estimator can also be designed for a strapdown 
inertial navigation system using quaternions (see Section 4.3). 

In the next chapter, we present another class of suboptimal 
estimators which are derived by means of some concepts from harmonic 
analysis. 
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CHAPTER 6 


THE USE OF HARMONIC ANALYSIS 
IN SUBOPT DIAL FILTER DESIGN 


6.1 Introduction 

In this chapter we will study the bilinear system- linear measurement 
estimation problem discussed at the end of Chapter 2. As discussed there, 
the equations (2,30) for the computation of the conditional moments of x 
are coupled, and thus represent an infinite dimensional estimator for 
x(t|t). The purpose of this chapter is the design of suboptimal estima- 
tors in the case that the bilinear system evolves on a compact Lie groun 
or homogeneous space. 

The technique for suboptimal filter design developed here involves 
the use of harmonic analysis on the appropriate Lie group or homogeneous 
space (see Appendix B)j thus we will explicitly take into account the 
structure of the system. Several authors have used a similar approach 
for systems defined on the circle [B9], [B14], [B19], [M9], [W6], 

Our approach is a generalization of that of Willsky [W63, whose work 
will be summarized in the next section. The technique of this chapter 
is also related to the generalized least-square approximation method of 
Center [Cl]. 

The basic approach involves the definition of an "assumed density" 
form for the conditional density of x(t) given observations up to time 
t (see Chapter 1). Our method differs from most previous assumed density 
approximations in that our approximation is defined on the appropriate 
compact manifold (as opposed to Gaussian approximations, for example. 
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which are defined on R n ). The assumed density will be defined by an 
expansion in terms of the eigenfunctions of the Laplace-Beltrami operator 
on the manifold (see Section B.4). 

The use of harmonic analysis will be motivated by a phase-tracking 
example of Willsky [W6] in Section 6.2. In Section 6.3, we discuss the 
general problem and show that we need only consider systems evolving on 
the special orthogonal group SO(n) and the n-sphere S n . Section 6.4 
contains the application of the technique to systems evolving on S n , 
while Section 6.5 contains the application to systems on SO(n). 

6.2 A Phase Tracking Problem on 

We first discuss a phase tracking problem studied by Bucy, et al, 
[B9], and Willsky [W6], in which the phase 0 and the observation z are 
described by 

d0 (t) « w dt + q 1/2 (t) dw(t), 0(0) = 0 n (6.1) 

c u 

dz(t) = sin 0 (t ) dt + r^ 2 (t) dv(t) (6.2) 

where v and w are independent standard Brownian motion processes in- 
dependent of the random initial phase 0^. We wish to estimate 0(t) mod2ir 
given z t , and we take as our optimal estimation criterion the minimiza- 
tion of 

E[(l-cos(0(t) - 0 (t) ) | z(s), 0 <_ s <_ t] (6.3) 

Noting that we are essentially tracking a point on the unit circle 
S^" (a Lie group), we reformulate the problem in Cartesian coordinates. 

Let 

x^ = sin 0 (t) , = cos 0(t) (6.4) 
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' hen 


0 ! 


dx 1 (t) 


dx 2 (t) 


-q(t) dt/2 w c dt + q^ 2 (t) dw(t) 


•(oi^dt + q^ (t) dw(f v 


1/2 


-q(t) dt/2 


x 1 (t) 


x 2 (t) 


(6.5) 

( 6 . 6 ) 


dz(t) = x^(t) dt + r (t) dv(t) 

which are of the bilinear process - linear measurement type discussed in 
Chapters 2 and 4. 


In Cartesian coordinates our estimation problem is to choose an 


estimate (x^(t), x^ft)) on the unit circle - i.e., such that 
x^(t) + x 2 '(t) - 1 


(6.7) 


* i 

; i 


If we use the least squares criterion 

J = ~ E[(x 1 (t) - ^(t)) 2 + (x 2 (t) - x 2 (t)) 2 |z(s), 0 £ s £ t] 

( 6 . 8 ) 


subject to (6.7), or equivalently subject to 


x^(t) = sin 6(t), x 2 (t) = cos 0(t) 


(6.9) 


our criterion reduces to 


J = E[1 - cos (0 (t) - G(t)) z(s), 0 < s < t] 


( 6 . 10 ) 


Thus (6.10) represents a constrained least-squares criterion of the type 
discussed In Chapter 4. One can show [B9], |W6] that 
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( 6 . 11 ) 


(x.,(t|t), fc 7 (t|t)) 

(^(t), x 2 (t)) = 1 

Vx^(t|t) + $t 2 (t|t) 

or 

. x- (t| t) 

9(t) = tan~ x — (6.12) 

x 2 (t j t) 

where 

x ± (t\t) = E[x ± (t)|z(s), 0<s<t] (6.13) 


Referring to Figure 6.1 we can see the geometric significance of 
this criterion. One can show that 


P(t) 



(t| t) + x|(t| t) 


< 1 


(6.14) 


and the quantity P(t) is a measure of our confidence in our estimate. 
Specifically, if 0 is a normal random variable with variance y, then 
(see [W2], [W6]) 

P =^J[E(sin 0) ] 2 + [E(cos 0)] 2 = e" Y/2 (6.15) 

- so y = 0 (perfect knowledge of £ ) ==>P = 1 
and y = co ( no knowledge) ==^P = 0 . 

As discussed in [B9] and [W6 ] , the optimal (constrained least- 
squares) filter is described as follows. The conditional probability 
density of 0 given {z(s), 0 s t} may be expanded in the Fourier 
series (notice that the trigonometric polynomials are eigenfunctions 
of the Laplacian on S^) 


P (6 > t) 


|V> eine 

n=— °° 


(6.16) 
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where 


c 

n 


(t) i ij Ele- 1 ” 9 '^ 


j2(s) , 


0 £ S £ t] 


= b (t) - ia (t) 
n n 


(6.17) 


Then the optimal filter is given by 

2 


dc n (t) = -[in(O c + ^ q(t)]c n (t)dt 


+ 




2i 


+ 2Trc n (t)Ini(c^(t)) 


dzCtJ^irlmCcj^Ct) )dt 


r(t) 


9(t) = tan 1 (a 1 (t)/b 1 (t)) 


(6.18) 

(6.19) 


Since c q = and c n = (where * denotes the complex conjugate), 
we need only solve (6.18) for n >_ 1. The structure of the optimal filter 
deserves further comment [W6] (see Figure 6.2). The filter consists of 
an intinite bank of filters, the n fc b of which is essentially a damped 
oscillator, with oscillator frequency nu)^, together with nonlinear 
couplings to the other filters and to the received signal. Notice, 
however, that the equation for c n is coupled only to the filters for c^, 
c^ and c^^. This fact will play an important part in our approx- 
imation. 


In order to construct a finite-dimensional suboptimal filter, we 
wish to approximate the conditional density (6.16) by a density 
determined by a finite set of parameters. Several examples 
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of "assumed density" approximations for this problem are discussed in 
[W2] , [W6], but we will concentrate on one that involves the assumption 
that p (0 , t) is a folded normal density with mode n(t) and "variance" 
y(t) : 

p (0 > t) = ^ ^ e~ n ^ (t)/2 e itt(e-n(t)) = F(e . ^ t)> Y(t)) 

n=- co 

( 6 . 20 ) 

The folded normal density is the solution of the standard diffusion 

equation on the circle (i.e., it is the density for Brownian motion 

1 1 

processes) and is as important a density on S as the normal is on R ; 
we will discuss this point in more detail in the next section, 


In this case, if c.j has been computed and if p(0, t) satisfies 
(6.20) then c„,_ can be computed (for any N) from the equation 


^+1 


<2,) (K+1 > -be, 


N(N+1) (N+l) 

C 1 


( 6 . 21 ) 


Thus we can truncate the bank of filters described by (6.18) by ap- 
proximating by (6.21) and substituting this approximation into the 

equation for c^. This was done for N=1 in [W6], and the resulting 
Fourier coefficient filter (FCF) was compared to a phase-lock loop and 
to the Gustafson-Speyer "state-dependent noise filter" (SDNF) [G2]. The 
FCF performed consistently better than the other systems , although the 
SDNF performance was quite close. 


6, 3 The General Problem 

The remainder of this chapter will be devoted to the study of the 
estimation problem for the following systems, which are generalizations 
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of the phase tracking problem. The first system consists of f^e bi- 
linear state equation 


N 


dX(t) = [A Q + j ^ Q_(t)A i A j ]X(t)dt + ^ A ± XCt)dw.(t) 
i , j=l i-1 

( 6 . 22 ) 


with linear measurements 


dz^(t) = X(t)h(t)dt + R^^(t)dv(t) 


(6.23) 


where X(t) and {A_^} are nxn matrices, z^(t) is a p-vector, w is a Wiener 
process with strength Q(t) > 0, v is a standard Wiener process independent 
of w, and R > 0. More general linear measurements can obviously be con- 
sidered, but for simplicity of notation we restrict our attention to 
(6.23), which arises in the star tracking example of Chapter 4. We also 
assume that the Lie group G = {exp <£} ,, is compact; hence, Theorem B.3 
implies that there is a symmetric positive definite matrix P such that, 
for all t, 

X' (t)PX(t) - P (6.24) 

In addition, it is shown in [D5] that this is true if and only if 

A'P + PA = 0 for all A £ SB (6.25) 


In particular {Aq, A^,..., A^} satisfy (6.25). 

Another way to derive (6.24) from (6.25) is through the use of 
Ito’s differential rule [Jl], [W8]. Assuming that {Aq, A^,...,A^} 
satisfy (6.25), we see that 
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dX'PX = X’ [(AJ + y 


2 Vl A ? P + PCA 0 + 2 

i,j=l 


N 


I ViV ,Mt 

i>j=l 


+ 



X T (AtP + PA.)Xdw. 
v i x x 


N 

+ V q..x*a! pa: xdt 

Z ^ 1 3 

i,j=l 


(6.26) 


The last term in (6.26) is the correction term from Ito r s differential 
rule (it is computed using the rule dw^(t)dw_. (t) = Q^(t)dt). The 
identity (6.25) implies that d(X’PX) = 0; hence, if X 1 (O)PX(O) = P, then 
X r (t)PX(t) « P for all t. 


The second system consists of the bilinear state equation 


dx(t) = 


iA 0 + 


N 

2 


I J V t)A i A 3 


N 

x(t)dt 4- ^ A i x(t)dw 1 (t) 
i=l 


(6.27) 


with linear measurements 

dz^Ct) = H(t)x(t)dt + (t)dv(t) 


(6.28) 


where x(t) is an n-vector, {A^} are nxn matrices, and z^j v j and w are 
as above. We assume that x evolves on a compact homogeneous space . 

The solution of (6.28) is 

x (t ) = X(t)x(0) (6.29) 

where X satisfies (6.22) with X(0) = I. Since x evolves on a compact 
homogeneous space, X must evolve on a compact Lie group; thus X(t) 
satisfies (6.24) for all t and (A^, A^,..., A^} satisfy (6.25). Then 
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t 


x'(t)Px(t) = x' (O)X' (t)PX(t)xCO) = x' (O)Px(O) (6.30) 

so the homogeneous space on which x evolves is of the form x’Px = constant. 
This conclusion could also be reached by using Ito’s differential rule and 
(6.25) as above. 

We now show that we need only consider systems evolving on the Lie 
group S0(n) - {X £ R nXtl |X T X = 1} and the homogeneous space 
S n = fx e R n |x’x = 1}, the n-sphere. First consider X satisfying (6.22) 
and (6.24), and define Y by 

Y (t) = P 1 ^ 2 X(t)P“ 1//2 (6.31) 

Then Y satisfies (6.22), but now Y T (t)Y(t) = I and 

A!^ + A ± = 0 i = 0,1, ... (6.32) 

and 

X(t | t) = p" 1/2 Y(t|t)P 1/2 (6.33) 

So the estimation problem for X is solved if we can solve the problem 
for Y evolving on S0(3). 

Similarly, if x satisfies (6.27) and (6.30), we define 

y(t) = P 1/2 x(t) (6.34) 

Then y satisfies (6.27) and (6.32), and 

y T (t)y(t) = y , (0)y(0) (6.35) 

Thus y evolves on S l1 if ||y(0)]| - y 1 (0)y(0) = 1. The estimate &(t|t) 
can be computed according to 

£(t|t) - P" 1/2 y(t|t) (6.36) 
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Because of the above analysis, we will limit our discussions in this 
chapter to systems evolving on SO(n) and S — i.e., we will assume that 
{A q , A^, . . . , A^ satisfy (6.32) (they are skew-symmetric). 

The underlying probability space for the estimation problem (6.22)- 
(6.23) on SO (n) is taken to be (ft, & , P), where ft is the space of 

continuous functions from [0, T] to S0(n), & is the Borel a-algebra for 

\ 

ft, and P is a measure on the space of continuous functions [ D2 ] , [W8]. 
The probability space for (6. 27)- (6. 28) on S n is defined analogously. 

The estimation criterion which w e will use for these two systems 
is the constrained least-squares estimator of Chapter 4. As discussed 
in Section 4.2, the optimal estimate for the S0(n) system is 

X(t 1 1) = 5t(t|t) [5lCf-Jt) f S:Ct| t) ] -1/2 (6.37) 

The optimal estimate for the S n system is given by (see Section 4.3) 

suit)- — — — !M e1_ (6 . 38) 

x(t | t) ’x(tft) [ f x(t[ t) | | 

Thus in both cases we must compute the conditional expectation of the 
state (x(t) or X(t)) given the observations = {z(s), 0 s <_ t}. 


The equations for computing the conditional expectation can, as 
discussed in Chapter 2, be derived from the nonlinear filtering equation 
(1.7) and the moment equation (2.20). The resultant equations for the 
S0(n) system (6-22)— (6.23) are 

dE t [X [ r P] (t)3 = [(A n + b S Q^<t) A . A ) <g) I]E t [xf p] (t)]dt 
V °[p] 2 .4- 1J X [p] J [Pi 

X , ] 1 


+{E t [x| pl (t)h’(t)X(t)]-E t [X^ p] rt)]h r (t)E fc [X(t)]}R 1 (t)dv 1 (t) 
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(6.39) 


(6.40) 


dv^(t) = dz^(t) - X(t]t)h(t)dt 

where (x) denotes Kronecker product and X^ P ^ is the vector containing the 
elements of the matrix X^ P ^ in lexicographic order [B8, p. 64], [M13, 
p. 9], [B13]. For the S n system (6. 27)-(6. 28) , we have 

N 

dE t [x lpl (t>] = [A n A -i ]E t [x [p] (t)]dt 

°[p] 2 J ± CP3 J [P1 

+{E t [x [pl (t)x' (t) l-E* [x [p] (t) ]E fc [x f (t) ]>H* (t)R _:L (t)dV 2 (t) 

(6.41) 

dv 2 (t) = dz 2 (t) - H(t)x(t|t)dt (6.42) 

As illustrated in Figure 6.3, the structure of these equations is 

quite similar to that of (6.18) — i.e., each estimator consists of an 

infinite bank of filters, and the filter for the p 11 * 1 moment is coupled 

only to those for the first and (p+l) st moments. Therefore, we are led 

to the design of suboptimal estimators. Motivated by the success of 

1 

Bucy and Willsky's phase tracking example evolving on S , we would like 
to design suboptimal estimators for the S0(n) and S n systems using 
similar techniques. 

We will require one further assumption in order to ensure the 
existence of the conditional density. Consider the deterministic 
systems associated with (6.27) and (6,22), as in Chapter 2: 

N 

x (t ) = [Aq + ^ Ajj^(t) ]x(t) (6.43) 

x=l 
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\ 


N 

X(t) = [A q + ^ A ± u i (t) ]X(t) (6.44) 

i=l 

We call (6.43) controllable on S n if for every pair of points x^, e S n 
there exists t > 0 and a piecewise continuous control u such that the 
solution tt(xq, u, t) of (6.43) with initial condition satisfies 
tt(Xq, u, t) = x^ [J3]» [S6]. Controllability of (6.44) on SO(n) is 
defined analogously. It will be assumed in this chapter that (6.43) 
and (6.44) are controllable on S n and SO(n), respectively (Brockett 
[B4] discusses more explicit criteria for the controllability of these 
systems) . 

For systems defined on S n or SG(n), controllability implies the 
property of strong accessibility [S6]. Thus the results of Elliott 
[E3] show that, under the assumption of controllability, (6.22) and 
(6.27) have smooth transition probability densities (with respect to 
the Riemannian measure on S n or the Haar measure on SO(n) — see 
Appendix B) . It is easy to c. .ow from the definition of conditional 
expectation [W8] that, for each ccefJ and each t, the conditional 
probability measure F(*|z t )(w) is absolutely continuous with respect 
to the unconditional probability measure ?(*)• Hence thp Radon- 
Nikodym Theorem [R2] implies the existence of the conditional proba- 
bility densities ptx,^) of x(t) given , with respect to the 
Riemannian measure on S n or the Haar measure on SO(n). 

We now review the notions of Brownian motion and Gaussian den- 
sities on Lie groups and homogeneous spaces, which have received 
much attention In the literature (see K. Ito [13], Grenander [G4], 

McKean [M7], [M8], Stein [S8], and Yosida [Y1]-[Y3]). Yosida [Y3] 
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proved that the density p(x, t) of a Brownian motion process on a 
Riemannian homogeneous space M with respect to the Riemannian measure 
(Haar measure if it is a Lie group) is the fundamental solution of 


. G * p(x> t) = o 


(6.45) 


where G* is the formal adjoint of a differential operator expressible 
in local coordinates as 

•■t i - J 1 j 

i“l i,j=l J 


with constant f and Q= Q r >_ 0. In particular, if G is the Laplace- 
Beltrami operator (which is self-adjoint [H3]), the fundamental solution 
of 


_ yApfx, t) = 0 


(6.46) 


where y > 0, is a Brownian motion on M. According to [M13] and [S8], 
the fundamental solution of (6.46) is given by 

p(x,t;x Q ,t 0 ) « \ (f> i (x)^ i (x 0 )e 

i 

where and 0 are the eigenvalues and the corresponding eigenfunctions 
of the Laplace-Beltrami operator (see Section B4) . The function 
p(x,t; Xg,tg) is the solution to (6.46) with initial condition equal to 
the singular distribution concentrated at x = x^. Also, Grenander [G4] 
defines a Gaussian (normal) density to be the solution of (6.45) for 
some t . 


The folded normal density F(9 ;ti,y) used by Willsky as an assumed 
density approximation for the phase tracking problem is indeed a normal 
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density on in the sense of Grenander [W2]. Motivated by the success 
of Willsky's suboptimal filter, we will design suboptimal estimators 
for the SO(n) and S n bilinear systems by employing normal assumed 
conditional densities of the form 


^ -V (t) 

p(x, t) = 2 ^ <P 1 (x)(f> i (n(t))e 
i 


(6.47) 


where r|(t) and y(t) are parameters of the density which are to be 
estimated. 


6.4 Estimation on S n 

In this section we will use the suboptimal estimation technique 

discussed in the previous section in order to design filters for the S n 

estimation problem (6.27)-(6.28) . The optimal constrained least-squares 

estimator is described by (6.38) and (6.4l)-(6 .42) . We will first 

2 

describe the suboptimal estimator in detail for S ; then we will discuss 
the generalization to S n . The problem is also of importance because 
the satellite tracking problem of Section 4.4 is of this form (notice 
that equation (4.35) has a time-varying drift term; however, this can 
be easily handled in the present framework) . 

2 

In our discussion of estimation on S , we will refer to a point 

2 A 

on S in terms of the Cartesian coordinates x = (x^, Xg, x^) or the 

polar coordinates (9, 40 (see (B.42)). The decomposition (B.41) of 

2 

homogeneous polynomials of degree n (restricted to S ) in terms of the 
spherical harmonics of degree <_ n implies the existence of a nonsingular 
matrix F such that 
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(6.48) 


Px 


[nl „ 


Yn(x) ‘ 

Y n-2 (x) 


[y 6 (x) J 


where Y 0 (x) is the (2JL + l)-vector whose components are the spherical 
harmonics {Y , -S, < m < i} of degree {.(defined in (B.46)-(B. 47)) and 
6 is aero or one depending on whether n is even or odd. Here the spaces 
spanned by Y^(x), £ =5,6 + 2,...,n-2,n are all invariant under the 
action of SO (3) (this decomposition is related to the classical notions 
of contractions and traceless tensors [H5]; see also Brockett [B3]). 
Hence the conditional moments E u [x LP i (t) ] , and consequently the optimal 
estimator (6.41), could have been expressed in terras of the "generalized 
Fourier coefficients 1 ’ 


2tt it 

c^ m (t) = f f Y*£ m (0(t), cj>(t)) p(0,<}>,t) sin9 d0d(j) 

4> 4) 


- E C [Y* am (0(t), ij>(t))] 


(6.49) 


Referring to Section B.5, we note that Y^ is an eigenfunction of the 

Laplace-Beltrami operator A „ (defined in (B.45)) with eigenvalue 

S Z 

-£(£ + 1), Thus the assumed density approximation is a normal density 
2 

on S of the form (6.47), as discussed in the previous section: 


03 £ 

p(e,<j>, t) = ^ ^(8,40 Y * £m (n (t), X(t))e 

{,=0 m =-£ 


-£(£+l)y (t) 


(6.50) 
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In other words* c rt (t) (as defined in (6.49)) is assumed to be 
Jem 


C &n (t) = Y *fijn CrtCt) * ACt))e ^' a " hl)Y(t) 


(6.51) 


A [Nl 

In order to truncate the optimal estimator after the x (t|t) 
equation using the assumed density (6.50), we must compute E t [x^^ (t)x' (t) ] , 
or equivalently, x^ N+A J(tft), in terms of x^^(t|t), p = 1,2,. -.,N. 

However, if x(t|t) is known, so are c^g(t) and an ^ a simple 

computation yields 


y(t) = - log 


[r ( c io (t > + 2 i=u (t >l 2 j 


(6.52) 




COS T) (t) ™ n n *| In 

[c^ 0 (t> + 2|c u (t)| Z ] 1/Z 


(6.53) 


sin h(t) = — g 


± ^ l c ll Ct) l 


[c^(t) + 2|c n (t)p] 


2J/2 


(6.54) 


ir 


If c u (t) = 0, then the density is independent of A(t); otherwise s 


2iA(t) C !l (t) 


c ll (t) 


(6.55) 


Then m' m ~ “ (^ + 1) ;> • * • » ^+1} can be computed from 


C N-KL,m (tJ “ Y N+l s ra^ tJ * A(t))e 


-(N+l) (N-f2)y(t) 


= (- 1 ) 


m 


(N+l-m) I 2N+3 


11/2 




(N+l-fm) ! 4tr 
m/2 


C 10 (t > 


N+ljin \(c^ 0 (t)+2tc 11 (t)| 2 ) 1/2 ; 


T ( c w (c) + 2 l c ii (t:) l 2 ) 


11 

f (N+l) (N+2) 


(6.56) 
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Finally, notice that (B.41) and (6.48) imply the existence of a non- 
singular matrix P such that 


Px 


[N+U _ 




.[N-13 


where is the (2 l+l) -vector with components {Y^, -S, <_ m £ £}. Thus 
x[^ + l](t|t) can be computed from {c. TJL1 , -(N+l) < m < N+l} and 
^[N l]^ t j t ^ The optimal estimator (6.41) is truncated by substituting 
this approximation for x^ + ^(t|t) into the equation for Ss:^(t.|t). 
Notice that the entire procedure for truncating the optimal estimator 
can equivalently be performed on the infinite set of coupled equations 
for the generalized Fourier coefficients » using the approximation 

(6.51). 


We note that one can show that 
-s/ | |x(t|t) j i £ 1 


and (see (6. 14)- (6. 15)) this quantity can be used as a measure of our 
confidence in our estimate. If x(t|t) satisfies the assumed density 
(6.50), 

| | x(t 1 1) j ] = e“ YCt) (6.57) 

and we can perform a similar analysis to that in the case (see (6.15)). 

2 

Example 6.1 : Suppose that we truncate the optimal S estimator (6.41) 

\2 1 

after N = 1 — i.e., we approximate x L J (t J t ) using the above approximation. 
The resulting suboptimal estimator is (for Q(t) = I) 
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dx(t t) = [A Q + - 


^ A^]x(tjt)dt 


i=l 


+ P(t)H'(t)R 1 Ct)[dz 2 (t) - H(t) x(t J t)dt] 
where the covariance matrix P(t) is given by 


(6.58) 


p ±i (t) - xj(t| t) ( ~ | j S(t ft) 1| - 1) - j (xf (t|t)+ x£(t|t))|jxCt|t)||+ 

(6.59) 

for i^j, i ^ k, j ^ k, and 


1 


P..(t) = x. (t[t)x. (tjt) (| |x(tjt)| |- 1) 


(6,60) 


3-J ' " i ' 1 ' J 

for i ^ j. Notice that, from (6.57), ||x(tft)|[ - 1 implies that the 
"variance" y(t) = 0; in fact, if ||5t(t[t)|| = 1, we see from (6.59)- 
(6.60) that the covariance matrix P(t) is identically zero. Thus if 
||x(tjt)[| = 1, this first order suboptimal filter assumes that it has 
perfect knowledge of x(t) and disregards the measurements. 


The extension to S n of this technique for constructing suboptimal 
estimators is straightforward. The procedure uses the spherical 
harmonics on S n , as defined in Section B.5. In polar coordinates, a 
point on S n can be described by (0^,0 2 , . , . ,0 n _^,{j)) ^ (G,<j>), where 
0 £ <_ it and 0 <f> <_ 2ir. Also, the spherical harmonics are denoted 

by 

Y £,(m) C0,(|>) * Y (0 1» * * * > 0 n-l 9± ^ 

j. 


H-im . <j> n-2 

— n-1 


7 T Csin 0^) 
k=0 


"Wl ®k+l + 2 (n k 

k+3/ 


Ccos W 


(6.61) 
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wit-* 


where £ > m, > . . . > m > 0 and C. are the Gegenbauer polynomial 3 

— ± ~ ~ n-l — 2 

[El] (that is, the functions Y £ ^ satisfy the four properties of 
Section B.5). Since ^ is an eigenfunction of the Laplace-Beltrami 
operator with eigenvalue -£(n+£-l), the assumed density approximation 


„n . 

on S xs 


p(9,*,t) - £ *!, w <nCO. «0)a 

£» (m) 


-£(£+n-l)y(t) 
(6.62) 


A „t, 


That is, c 0 , N (t) = E [Y& , . (0(t), <j>(t))I is assumed to be 
Jc, (m) (m) 


C £,(m) (t) Y f,(m) Cn(t) ' 3 A(t))e 


-£ (£+n-l)y(t) 


(6.63) 


The procedure for truncating the filter (6.41) is identical to the 
S case. If x(t|t) is known, so are and t^ ese can be used 

to compute y(t), n(t), and X(t). Then {c N+1 ^ (t)} can be computed 
from (6.63), and x^"*"^(t|t) can be computed from ( m )(t)} and 

g[N The estimator is truncated by substituting this approximate 


expression for x^ +A ^(t|t) into the equation (6.41) for x LriJ (t|t), 


[N] 


6.5 Estimation on S0(n) 

In this section we discuss the construction of suboptimal estima- 
tors for the S0(n) estimation problem (6.22)— (6.23) . Since S0(2) is 

1 

isomorphic to the circle S , the case n=2 was discussed in Section 6.2. 
We will first consider the SO (3) problem, the importance of which was 
discussed in Chapter 4. Then we will extend the results to S0(n). 

£ 

Consider the sequence {D , £ = 0,1,...} of irreducible unitary 
representations of SO (3), as defined in (B. 34)-(B.35) . Theorem B.7 
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i 


implies that, for fixed £, the matrix elements {d^ ; £ < m, n < a} are 

’ 5 ran — — 

eigenfunctions of the bi- invariant Laplacian defined in (B.33) 

with the same eigenvalue also, all eigenfunctions of the Laplacian 

£ 

can be written as linear combination of the {D }. Hence, the assumed 

min 

density which will be used to truncate the optimal estimator (6.39)- 
(6.40) is a normal density on SO (3) of the form (6.47): 

" * o o -X y(t) 

P (R,t) = V y D^(R) D^(n(t))* e * (6.64) 

£=0 m,n=-£ 

where R, n(t) e SO (3) and y(t) is a scalar. That is, 


: £ (t) ^ [D^ (n (t))*] 

mn mn. 


is assumed to be 
£ 


V. ;c) ‘ D m> (n(t>) * e 




(6.65) 


( 6 . 66 ) 


The procedure for truncating the filter (6.39) is similar to the 
S n case, although we make use of some additional concepts from represent- 
ation theory. If X(t|t) is known, so are {^^(t); ~ 1 £ m, n£ 1}, since 
1 

D is equivalent to the self-representation of SO(3). Define the matrix 
£ £ 

C (t) with elements c (t), - £ < m, n < £; then 

mn — — 

A — 1 1 1 1 - 2^jY(t) 

A(t) = C 1 (t)C ± (t) = [D X (n(t))r [D X (n(t)))* e 1 

- 2 A y(t) 

- I • e x (6.67) 

since D X is unitary (here C Is the hermltian transpose of C) . Thus y(t) 
can be computed from 

y(t) = ~ ^ log[ J tr A(t) ] (6,68) 
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Then the elements of n(t) cai be computed from (6.66) and (6.68), since 
\ 

D (r|(t)) is similar to n ( t ) . Once Y(t) and n(t) have been computed, 

{c^ + ^; -(N+l) < m,n < N+l} are computed from the formula (6.63). 
mn — — 

In order to truncate (6.39) after the N t ^ 1 moment equation, we must 
approximate E fc (t)h’ (t)X(t) ] ; however, this matrix consists of time- 
varying deterministic functions multiplying elements of X^*^^ (t 1 1) , 
so we will show how to approximate this matrix. The symmetrized Kronecker 
pth power operating on the symmetric tensors x^ P ^ such that 

{]x^|j = j[x|| P = 1 furnishes a representation of SO (3) which is 
reducible [115], [M16]. In fact, (B.41) and (6.48) imply that there is 
a nonsingular matrix P such that 


-pxtplp-! = 


D P (X) 0 

0 X^ 


(6.69) 


The matrix P is related to the Clebsch-Gordan coefficients (B. 38)-(B. 39) , 

but P can also be computed by the method of Gantmacher [G7, p. 160]. It 

a r l . 

is clear from the decomposition (6.69) that X (t]t) can be computed 
from C N+1 (t) and X^ ^(t|t). The optimal estimator (6.39) is truncated 
by substituting this approximation into the equation for X J (t|t). 


We note here that, due to the decomposition (6.69), the estimation 
equations and the truncation procedure could have been expressed solely 
in terms of the irreducible representations D P (X(t)). However, we have 
chosen to work with the X LPJ equations primarily for ease of notation. 
For large N, the D P equations would provide significant computational 
savings over the X^ p ^ equations, as these are redundant; however, the 
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practical implementation of this technique will probably be limited to 
small values of N. 

As in the previous section, the extension of this technique to SO(n) 

is straightforward. In this case, we make use of the irreducible 

representations of S0(n) denoted by D^l’"' 1 k^ - 1)^ xtfhere n = 2k 

or n = 2k+l and [f] = [f^...,^] denotes a Young pattern (see Section 

B.5). Theorem B.7 implies that, for fixed [f], the matrix elements 

{D„ 1 < £,m < are eigenfunctions of the bi-invariant Laplacian 

£m — “• 1 1 J 

on SO(n) with the same eigenvalue Thus the assumed density is a 

normal density on SO(n) of the form 


[f] £,m 


(6.70) 


where R, n(t) e S0(n) and y(t) is a scalar. That is. 


<*> ■ 


(6.71) 


is assumed to be 




(6.72) 


If X(t|t) is known, so are * * * 9 ^ (h) ; 1 <_ £,m _£ n} , since 

pEljO , . . . ,0] just the self-representation of S0(n) (see Section B.5). 

If we define the matrix C^(t) with elements * * * ’^(t) ; 1 <_ £,m, n) , 

then 

A 1 1 -2A Y (t) 

A(t) = [c i (t)3 l [c i (t)i = n'(t)nct) e 


= I * e 


-2X^(6) 


(6.73) 
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and Y(t) can be computed from 


Y(t) = “ 2 T~ log t n tr 

Then the elements of n(t) can be computed from (6.72) and (6.74). 


(6.74) 


In order to truncate the optimal estimator (6.39) after the N* 1 * 1 

.[N+l] 


(t|t) as before. Since the 


moment equation, we approximate X 
carrier space of the representation ' ' ' >^ ] is spanned by the 

spherical harmonics of degree p, the decomposition (B.41) implies that 
there exists a nonsingular matrix P such that 

* • > 0 ] 


PX 


[p]p-l _ 


(X) 


X 


Cp-23 


(6.75) 


(see Section B.5). 

Hence, precisely as in the SO (3) case, we compute 
C N+1 (t) & fr0n C6 ' 72) 

and then compute X^ + ^(t|t) from C^ + ^(t) and ^(t|t). The optimal 
estimator (6.39) is truncated by substituting this approximation into 

["Hi i 

the equation for X (t]t). 
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CHAPTER 7 


CONCLUSION AND SUGGESTIONS FOR FUTURE RESEARCH 

This thesis has been concerned with estimation and stability for 
nonlinear stochastic systems. The basic approach has been the explicit 
utilization of the algebraic and geometric structure of certain classes 
of systems. With this approach, it was possible to derive some interesting 
conditions for stochastic stability and to design both optimal and sub- 
optimal estimators. A detailed summary of the major results is given 
below. 

7.1 Summary of Results 

First, the stability of bilinear systems driven by colored noise was 
considered. Necessary and sufficient conditions for the p c ^ order 
stability of bilinear systems evolving on solvable Lie groups were 
derived, and several examples were presented. Some approximate methods 
foi' deriving stability criteria for general bilinear systems driven by 
colored noise were discussed, but no definitive results were obtained. 

In order to motivate the discussion of estimation problems and to 
demonstrate the applicability of stochastic bilinear models, several 
practical estimation problems were formulated. These problems involved 
the estimation of three-dimensional rotational processes and the tracking 
of orbiting satellites. 

The investigation of estimation problems involved both optimal and 
subopt irnal estimation. It was first shown that the optimal conditional 
mean estimator for certain classes of systems is finite dimensional. 

These classes of systems are characterized by linear measurements of a 
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Gauss-Markov process E, then feeds f orward into a nonlinear system. 

For some nonlinear systems, including those with a finite Volterra series 
and certain bilinear systems, it was proved that the optimal estimator is 
finite dimensional. However, for general nonlinear systems the optimal 
estimator is infinite dimensional, and a suboptimal estimation technique 
was presented. 

Finally, suboptimal estimation for bilinear systems driven by white 
noise was discussed. The theory of harmonic analysis was used to design 
suboptimal estimators for bilinear systems evolving on compact Lie groups 
and homogeneous spaces. The basic approach involved the assumption of 
an assumed density, which was the solution of the heat equation on the 
appropriate manifold. 

7 . 2 Suggestions for Future Research 

In this section, several topics for future research which are 
suggested by the work in this thesis are presented. 

1) The problem of deriving explicit necessary and sufficient 

conditions in terms of A , A, , . . . ,A 'for the p^ order (asymp- 

o 1 N 

totic) stability of the bilinear system (2.12) driven by white 
noise. For example, the derivation of necessary and sufficient 
conditions under which (2.12) is p^ order stable for all p is 
an open problem. 

2) The development of a procedure for bounding :he solution of a 
general bilinear system by the solution of one in which 3* is 
solvable. This will lead to better conditions for the stability 
of bilinear systems driven by colored noise. 


- 107 - 


3) The extension of the bilinearization and Volterra series 
techniques to nonlint a’ systems driven by white noise (see 
[K4], [L5]). This may permit the application of bilinear 
stochastic stability results and the suboptimal estimation 
techniques of Chapter 6 to more general nonlinear systems. 

4) The evaluation of the suboptimal filters of Chapter 6 by means 

of computer simulations. This is presently being done for the 

first-order filter of Example 6.1 and the corresponding second- 

2 

order filter, for the system (6. 27) -(6. 28) evolving on S ; 
these filters are being compared with the extended Kalman filter 
[Jl], the Gaussian second-order filter [Jl], and the Gustafson- 
Speyer "state-dependent noise filter" [G2], Unfortunately, 
these simulations have not been completed in time for pre- 
sentation in this thesis. 

5) The use of harmonic analysis in estimation for bilinear systems 
driven by colored noise. 

6) The application of the various techniques of this thesis to 
both deterministic and stochastic control problems. For 
example, a procedure analogous to the one developed in Chapter 

6 may provide useful suboptimal controllers for certain problems. 


APPENDIX A 


A SUMMARY OF RELEVANT RESULTS FROM ALGEBRA AND DIFFERENTIAL GEOMETRY 
A. 1 Introduction 

In this appendix we summarize the results from the fields of 
differential geometry, Lie groups, and Lie algebras which are relevant 
to the research in this thesis. Proofs and more extensive treatments 
of these subjects may be found in [A2], [B20], [C3], [G5], [113], [J4], 
[SI], [S2] , [Wll]. 

A. 2 Lie Groups and Lie Algebras 

The study of general Lie groups and Lie algebras requires concepts 
from the theory of differentiable manifolds. However, the research in 
this thesis is primarily concerned with matrix Lie groups and Lie 
algebras, and our basic definitions will follow the work of Brockett 
[Bl] and Willsky [W2]. 

2 

Let R be the n -dimensional vector space of nxn matrices with 
real-valued entries. 

» 

Definition A.l : An nxn matrix Lie Algebra ,5? is a subspace of R 
which has the property that if A and B are in then so is their 

r , A 

commutator product , [A, B] = AB - BA. 

We note that the intersection of two Lie algebras is also a Lie 
algebra, but the union, sum, and commutator of two Lie algebras are not 
necessarily Lie algebras. 

Definition A. 2 : Let S be a subset of R nxn . The Lie algebra 
generate d by S, denoted {S}^, is the smallest Lie algebra which 


contains S. 
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Definition A. 3 : A Lie subalgebra of a Lie algebra S'is a subspace 
of SB that is also a Lie algebra. A Lie subalgebra J? is an ideal of SB 
if [A, B] e ^ whenever A £ SB and B £ 3. 

nxn 

Definition A. 4 : Let T be a set of nonsingular matrices in R 

The matrix group generated by T, denoted {t)^ is the smallest group 

under matrix multiplication which contains T. If S is a subspace of 

R nXn s we define the matrix group 

A, A A 

T = {exp sL = {e e 1 ... e P |a. e S, p^0,l,2,...} (A.l) 

A matrix group G is called a matrix Lie group if there exists a matrix 
Lie algebra SB such that 

G = (exp SB) n 

There is then a Lie "Igebra isomorphism between 5? and the tangent 
space of G at the identity [SI]. It has been shown by Brockett [Bl] 
that if is a collection of subspaces of R nxn , then 

{ exp j . * « $ exp S p }* G {exp{ , . • « j ^ } G (A. 2) 

The relationship between these concepts and the theory of differ- 
entiable manifolds can be explained as follows [Bl], Let i?be a Lie 
algebra. At each point T in {expi?}^ there is a one-to-one mapping <j> T 

from a neighborhood of 0 in SB onto a neighborhood of T in {exp<S?} which 

G 

is defined by 

<f> T : ^{exp5?} G > 4 > t (L) = e L T (A, 3) 

Since this map has a smooth inverse, {exp SB } G is a locally Euclidean 
space of dimension equal to the dimension of L, In addition, the set 
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co r 

of maps } form a differentiable structure of class C on {exp <S?}g 
[Wll]. Thus {exp SB']- has the structure of a differentiable manifold 
[Wll]. 

The analysis of systems defined on manifolds which do not have a 
Lie group structure leads to the following definitions. 

Definition A. 5 : Let M C R n be a manifold, and let G be a matrix Lie 
group in R I1Xn . We say that G acts on M if for every x £ M and every 
T £ G, Tx belongs to M; in this case, G is called a Lie transformation 
group. The group G acts transitively on M if it acts on M and if for 
every pair of points x,y in M, there exists T £ G such that Tx = y. 

If x £ M is fixed, then - {T £ G[Tx = x} is a subgroup of G called the 
isotropy group at x. 

Definition A. 6 ; Let G be a Lie group which acts transitively on a 

manifold M. Let x be some (fixed) point in M. Let G/H be the set 

X 

{TH It £ G} of left cosets modulo H . Then there is a diffeomorphlsm 
x 1 x 

between G/H and M, and M is called a homogeneous space ( coset space) 

X 

[Wll] . 

A. 3 Solvable, Nilpotent, and Abelian Groups and Algebras 

The definitions and properties of some important classes of Lie 
algebras and Lie groups are presented in this and the next section. 

Definition A. 7 [Si] ; A Lie algebra SB is solvable if the derived 
series of ideals 

^>(°) _ g> 

^(n+i) = ^(n)^(n) 3 = {[ A}B ][a,B £ 3? (n) }', n > 0 (A. 4) 

terminates in {0}. SB is nilpotent if the lower central series of ideals 
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sb° = se 


& n+1 = [SB,SB n ] = { [A,B] jA £ S?, B n > 0 (A. 5) 

r\\ i 

terminates in {0}. SB is abelian if .Sr = •£? = {0}. Note that abelian 
=$> nilpotent =$* solvable, but none of the reverse implications hold in 
general. 


l emma A.l [SI, p. 214] : A matrix Lie algebra SB is solvable if and 

only if there exists a (possibly complex-valued) nonsingular matrix P 
-1 

such that PAP is in upper triangular form (zero below diagonal) for 
all A £ SB. 

Lemma A. 2 [SI, p. 224] : A matrix Lie algebra SB is nilpotent if and 
only if there exists a (possibly complex-valued) nonsingular matrix P 
such that, for all A e SB, PAP 1 has the block diagonal form 


<f> x (A) 


<j>l(A)J 


0 2 (A) 


(A) 


(A. 6) 


(this will be called the nilpotent canonical form) . The functions 


4)^: SB are linear. Furthermore, ” (0)* 
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A useful criterion for solvability can be expressed in terms of the 
Killing form. 

Definition A. 8 ; Let SB be a matrix Lie algebra. If A, B £ SB, the 

• *t 

operators a-d^SB+SB are defined by ad°B = B, ad^B = ad^B = l A s B ] , 
i+1 i 

ad^ B = [A, ad^B]. If ^ is a Lie subalgebra of <5?, we define 

ad?'. 33 = (ad^ B | Bc^}. The Killing form of 3! is a symmetric bilinear 
form on SB given by 

K(A,B) = trace(ad^o a<?^) (A. 7) 

Theorem A. 1 (Cartan's criterion for solvability) [S21 : A Lie algebra 
5? is solvable if and only if K(A,B) = 0 for all A and B in the derived 
algebra SB^~* . 

We define the corresponding Lie groups as follows. 

Definition A. 9 : The matrix Lie group G = {exp S ?} g is solvable if 
3? is solvable; G is nilpotent if SB is nilpotent; G is abelian if SB is 
abelian. 

We note that Definition A. 8 is equivalent to the usual definition 
expressed strictly in terms of properties of the group G[S1]. 

A* 4 Simple and Semisimple Groups and Algebras 

It can easily be shown [S2] that the sum of two solvable (nilpotent) 
ideals of a Lie algebra SB is solvable (nilpotent). Hence we make the 
following definitions [SI], [S2]. 

Definition A. 10 : Let SB be a Lie algebra. The radical 3R of SB is the 
unique maximal solvable ideal of SB (i.e.,^? is the sum of the solvable 
ideals ofiZ 9 ). 
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Definition A. 11: A Lie algebra SB is semisimple if it has no abelian 


ideals other than {0}. Thus SB is semisimple if and only if its radical 
SR~ (oh SB is simple if it is non-abelian and has no ideals other than 
{0} or ,3?. 

The Killing form can also be used to formulate a criterion for 
semisimplicity. 

Theorem A. 2 (Cartan's criterion for semisimplicity) [S2]: 

A Lie algebra S? ±s semisimple if and only if its Killing form is 
non-degenerate (i.e., if A e SB and K(A,B) = 0 for all’ B zSB , then A = 0) . 

Combining the Levi decomposition of an arbitrary Lie algebra and 
the complete reducibility of a semisimple Lie algebra, we have the 
following theorem [G5], 

Theorem A. 3 : An arbitrary nonsemisimple Lie algebra ,3? has a semi- 
direct sum structure 


SB~S%+ SP 
l&’M CM 


(A. 8) 


where 0t is the radical of SB and SP is a semisimple subalgebra. Further- 
more, ^can be written as the direct sum of simple subalgebras 


SP = «S? +&’ 2 + «S? + ... 
\J? V SP^ = (0), i * j 


(A. 9) 


We define the corresponding Lie groups as in the previous section. 
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Definition A. 12: The matrix Lie group {exp^}„ is simple (semi" 

_ — (j 

simple) if SB is simple (semisimple) . 

Again, Definition A. 12 is equivalent to the usual definition in 
term of properties of the group [Si] . 
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APPENDIX B 


HARMONIC ANALYSIS ON COMPACT LIE GROUPS 


B.l Haar Measure and Group Representations 

In this section we summarize some facts from the theory of integration 
and representations for compact Lie groups. For details see references 
[C3], [D4], [L7], [S8], [Til, and [H4]. 

Lemma B.l ; A compact Lie group G has a regular Borel measure ]l (the 
Haar measure ) satisfying the properties 

(1) H(G) < 05 

(2) (Left invariance) u(gE) - y(E) 
for any g e G and Borel set E C G 

(3) (Right invariance) h(Eg) - V(E) 
for any g e G and Borel set E C G 


He assume henceforth that the Haar measure is normalized so that 

/ dU(g) = 1; dp(g) will also be denoted dg. This normalized bi- invariant 
J G 

measure is unique. We now turn to the representations of compact Lie 
groups . 


Definition B.l ; Let G be a Lie group and V a (real or complex) 
finite-dimensional vector space. A finite-dimensional matrix representa- 
tion of G is a continuous homomorphism D which maps G into the group of 
nonsingular linear transformations on V. That is, 

(1) D(g^) D(g 2 ) = U(g x g 2 ) for g 1# g 2 e G 

(2) D(e) = I, the identity mapping on V, where e is the identity in G 
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(3) gb»D(g)v is a continuous mapping of G into V for each fixed 
v £ V. The vector space V is called the carrier space of the 
representation. 

11 2 2 

Definition B.2 : The representations D on V and D on V are 
equivalent representations of G if there is a vector space isomorphism 
S.-V 1 V 2 such that D 1 (g) = s"'4) 2 (g)S for each g £ G. A unitary 
representation is a representation in which D(g) is a unitary trans- 
formation of V for all g e G. 

Theorem B.l : Any finite-dimensional representation of a compact Lie 
group is equivalent to a unitary representation. 

1 2 

Suppose that D and D are representations of a compact Lie group G 
1 2 

on vector spaces V and V , respectively. Then we can construct other 
useful representations as follows. 

Definition B.3 ; The direct sum D -1, © u is the representation on 

V 1 @V 2 given by (D 1 © D 2 ) (g) (v^Vg) = (D 1 (g) v^,D 2 (g)v 2 ) for g £ G and 

12 1 2 1 2 
(v^v^) £ V A (?)V . The tensor product representation D (x) D on V (x) V 

is given by (D 1 @D 2 ) (g) Cv^ © v 2 ) = (D^(g)v^) (x) (D 2 (g)v 2 ). If D 1 and D 2 
are matrix representations, then the direct sum is the matrix representa- 
tion 

D 1 (g) 0 

0 D 2 (g) 

and the direct product representation is given by the Kronecker product 
D 1 (g)®D 2 (g) [B13], 


(D 1 ©D 2 )(g) = 
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Definition B.4: A subspace WCV is invariant under the representation 


D if, for each w £ W and g £ G, D(g)w is also in W. A representation D on 
V is irreducible if V has no non-trivial D- invariant subspaces, and it is 
completely reducible if it is equivalent to a direct sum of irreducible 
representations . 

Theorem B.2 : Any finite-dimensional representation of a compact Lie 
group is completely reducible; in fact it is equivalent to a direct sum 
of irreducible unitary representations. 

Another useful result is proved in [D5], 

Theorem B.3 ; Any finite-dimensional representation D of a compact 
Lie group G leaves invariant some positive definite hermitian form 
Q(v,w); i.e., 

Q(D(g)v, D(g)w) = Q(v*w) (B.l) 

If D is a matrix representation and Q(v,w) *= v’Qw (where Q is positive 
definite), then (B.l) becomes 

D (g) Q D(g) = Q (B-2) 

where D denotes the hermitian transpose of D. 

B.2 Schur*s Orthogonality Relations 

Without loss of generality, we will henceforth consider all finite- 
dimensional representations to be matrix representations. 

1 2 

Theorem B.4 ; Suppose that D and D are inequivalent irreducible 

finite-dimensional unitary representations of a compact Lie group G, 

1 2 

with matrix elements D..(g) and D..(g) repsectively . Then 
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'■'AT 


(g) [D* (g)]*dg = “ 6, „ 5 . . 6 

/ im 6 jn V6/ J & n lcfi, xj rail 

''g 56 


(B.3) 


where * denotes complex conjugate, n is the dimension of D (g) , and 
6..=lifi = j,<5.,=0 elsewhere. 

Before proceeding with the Peter-Weyl Theorem, ive state a result 
which applies Theorem B.4 to the reduction of an arbitrary representation 
into a direct sum of irreducible representations. 


Definition B.5; The character associated with a matrix representation 


of G is the function X defined by 


n 

X(g) = trace D(g) = ^ D i± (g) 


(B.4) 


1 2 

Suppose X» X , and X are the characters of the representations D, 

•to 17 

D , and D , respectively. If D(g) = D (g) ©D (g) , then 


X(g) = x 1 (g) + x 2 (g) 


(B.5) 


if B(g) = D^Cg) ©D 2 (g) , then 


x(g) = x 1 (g)x 2 (g). 


(B.6) 


1 2 

One can also show [Tl] that the representations D and D are equivalent 
1 2 

if and only if X = X • 

According to Theorem B.2, any finite dimensional representation D 
of the compact Lie group G is equivalent to the direct sum of irreducible 


unitary representations 


JL H 

D (g) * D 1 (g)© ...©D P (g) 


(B.7) 
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y 


Then by (B. 5) , 


Z £ 

X(g) =X 1 (g) + ... +X P (g) 


(B.8) 


and hence 

£ . 

X(g)=2?Z X X (g) (B.9) 

i 

£. £. £. 

i 1 .i 

where X is the character of D , X is the character of D, V is the 

£ . 
i 

number of times the irreducible representation D occurs in the sum 
(B.7), and the summation is over the set of equivalence classes of finite- 
dimensional irreducible representations of G. The following corollaries 
to Theorem B.4 are immediate. 


Corollary B.l ; The characters of the irreducible unitary representa- 
1 2 

tions D and D of the compact Lie group G satisfy 


f X 1 Cg)[X 2 Cg)l*dg = 


/ 12 

41, if D and D are equivalent 

I 0, otherwise 


(B.10) 


Furthermore, if a representation D is decomposed as in (B.7) - (B.9), then 




a. 

x(g)tx 1 


(g)]*dg 


(B. 11) 


1 2 

Corollary B.2 : Let D and D be irreducible representations of the 

1 2 K 

compact Lie group G. Assume that D (x)D is equivalent to D 10... ©D , 
Z ± 

where the D are irreducible representations. Then 


\ - f X 1 Cs)X 2 (g)[X i (g)]*dg (B. 12) 

1 *G 
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In the case that G is semisimple, Steinberg [S9] , [B20], [J4] gives 

an alternate formula for v. of Corollary B.2. 

*1 

B. 3 The Peter-Weyl Theorem 

In this section we state the major result in harmonic analysis on 
compact Lie groups {C3] , [D4] , [H4] , [L7 ] , [S8] , [Tl] , [Wll] . 


Definition B.6 : The repr es ent at i ve ring of a compact Lie group is 
the ring generated over the field of complex numbers by the set of all 
continuous functions D^ which are matrix elements of some unitary 
irreducible representation D. 


Theorem B.5 (The Peter-Weyl Theorem) : Let G be a compact Lie group, 
(a) The representative ring is dense in the space of complex- 
valued continuous functions on G in the uniform norm. That is, if f is 
a continuous function on G, and if e > 0 is given, then there is a 
function f ir the representative ring such that |f(g)-f(g)| < e for all 
g e G. 


(b) Let A be the set of equivalence classes of finite-dimensional 

irreducible representations of G. For each a £ A, pick a unitary 
ct 

representation D , If f £ LgCG), define the Fourier coefficient 


la _ r 

ij " J 


fCg)[D® (g)3*dg 


(B. 13) 


Then the set of functions {(At - D*?. ) is a complete orthonormal set in 

1 a ij J 

L 2 (G); i.e., we have the Parseval identity 


- 121 - 


(B.14) 


||f|i^ A /|f( 8 )| 2 d g = 2 f \*r ^\ 2 

G a£A i,j=l 

where n„ is defined in Theorem B.4. 
a 

The sum in (B.14) is defined in [R2, p. 84]; notice that (B.14) 
implies that the set of all a such that f 4 0 for some i and j is at 
most countable. 

The Peter-Weyl Theorem thus yields the direct sum decomposition 


V G) "©CCA H a 

2 

where denotes the vector space spanned by the (n a ) functions 
; i , j=l, . . . ,n^} . 


(B. 15) 


13 


B.4 The Laplacian 

The Laplacian on a compact Lie group is closely connected with the 
theory of harmonic analysis presented in the previous sections. 


Theorem B.6 [S8, p. 35] : Let G be a compact Lie group. There exists 
a second-order differential operator A on G (the Laplacian ) , such that 

CO 

(a) A is b i- invariant ; i.e., for all f £ C (G) and h £ G, 


AC^f) = R^Af) 
Ad^f) = I^CAf) 


(B. 16) 


(B.17) 


where is the right translation defined by (R^f ) (g) = f(gh), and L^ is 
the left translation (L^f ) (g) = f(h ^g); 

(b) A is elliptic; 

00 

(c) A is formally self-adjoint; i.e., for any f^, fj £ C (G) 
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7 



f l (g)[Af 2 (g)]*dg 


J [A^Cg) ]f 2 *(g)dg 
G 


(d) A maps constant functions to zero. 


If G is a compact matrix Lie group and {A^,...,A^} is a 
the Lie algebra of G, then the Laplacian can be expressed in 


basis for 
the form 



Q- .D.D. 
i] l] 


(B.18) 


where 0 is a symmetric positive definite matrix and the differential 
operators are defined as follows : let f be a function on the group G; 
then we define for g £ G 


(lLf)(g) = — f ((exp(A i t)) *g) 


t=0 


(B.19) 


We note that the Laplacian is not necessarily unique; however, it 
is unique if G Is simple [S8, p. 36]. We will subsequently work with a 
single differential operator A on G satisfying Theorem B.6; however, a 
different choice of Q in (B.18) will define an equally valid Laplacian. 

It can be shown that A is the Laplace-Beltrami operator on G 

corresponding to a suitably defined bi-invariant Riemannian metric on 

G (see [H3], [S8], [Wll]) . A Riemannian metric on a manifold M is a 

smooth choice of a positive definite inner product <, > m on t ^ ie tangent 

space M at each point m £ M. If <f>: m -*■ (x., (m) , . . . ,x (m)) is a co- 
rn in 

ordinate system valid on an open set U C M, we define the functions 
S ± j > g 1J > g on U by 
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(B. 20) 


g ij (m> 



> 

m 

m 


V 3k 

Zg i3 «gW - 6 ik 

3=1 


(B. 21) 


g(m) = | det (g ± (m) ) | 


(B. 22) 


Then the Laplace-Beltrami operator in terms of local coordinates is 



(B. 23) 


The next theorem relates the eigenfunctions of the Laplacian to 
the representative ring. 

Theorem B.7 [S8, p. 40], [Wll, p, 257] : Let G be a compact Lie 

group, and let H be defined as in equation (B.15). Then each function 
ct 

<f> £ H Q is an eigenfunction of the bi-invariant Laplacian A, and all 4> £ 
have the same eigenvalue X . Conversely, each eigenfunction <j> of the 
Laplacian is an element of the representative ring. 

Hence, harmonic analysis on a compact Lie group can be performed 
either in terms of the representative ring or the eigenfunctions of the 
bi-invariant Laplacian, since these two sets of functions are the same. 


B.5 Harmonic Analysis on S0(n) and S n 

In this section we discuss the application of the results of the 

previous sections to the special orthogonal group S0(n) and the n-sphere S . 

2 

The results for SO (3) and S will be discussed in detail. 
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The Lie group SO(n) is defined by 

SO(n) = {X£ R nXn j X T X =1, det X = + 1} (B.24) 

The theory of representations of SO(n) is discussed in [B23] , [H5], [H6], 
[J5], [L9], [M16]. We will present only a brief summary of the subject; 
the reader is referred to the references for details. Each irreducible 
representation of S0(n) can be characterized by a set of k integers 
(where n = 2k + 1 if n is odd, and n = 2k if n is even). This set of 
k integers can be either the highest weight (A) = (A^,...,A k ) (see [B20], 
[B23], [H6]) or the Young pattern [f] = [f^,...,f^], where f^ >_ f^ + ^ 
and f^ >_ 0 (see [H5], [J5], [L9], [M16]). The two notions are related by 

A. = f. • f.., for i = l,.,.,k-l 
x x x+1 

\ - f u (B - 25 > 

We denote an irreducible representation corresponding to (A) or [f] by 

or D^; the dimension of is denoted by n N^, and is computed 

in [B23], [H5]. For X e SO(n), the representation . . . ,0) ^ _ x 

is called the self-representation . 

Given a matrix representation D of S0(n), Theorem B.2 states that 
there exists a nonsingular matrix P such that, for g e S0(n) 

(V (A ) , 

D(g) = P(D 1 (g)©...©D p (g))P (B.26) 

<V 

where the {D } are irreducible representations. It is often necessary 
to compute the transformation matrix P; in particular, one must sometimes 
decompose the tensor product of two representations. Consider the tensor 
product D^(x)D^ of two unitary irreducible representations. The 
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occurs in 


( V 

number of times that the irreducible representation D 
the decomposition of (x)D^^ can be calculated from the highest 

weights, the Young tableaux, or the characters via (B.12); the result 
is the Clebs ch-Gordan series [B20], [J5], [L9], [MiG] 


D 


(j) 




(B. 27) 


C i ) (k) 

The elements of the matrix which transforms D (x) D into the direct 
sum (B.27) can also be computed [J5], [L9]; these elements are known as 
the Clebsch-Gordan, Wigner, or vector coupling coefficients. 


Mow we consider the special case SO (3). Any matrix R in S0(3) has 
an Euler angle representation of the form [Tl] 


R = Z(ci))X{0)Z(^) 


(B. 28) 


where 







m 


' 


cost]) 

-sin<j) 

0 


1 

0 

0 

z(<f>) = 

sin<j> 

cos4> 

0 

, X(0) = 

0 

COS0 

-sinQ 


0 

0 

1 


0 

sin0 

cos6 


L 




_ 




(B.29) 

and the Euler angles (J), 0, if* have the domain 0 ^ 2Tr, 


0 < <2tt. The element of SO (3) with the representation (B.28) will be 


denoted by R($, 6, i|>) or just (<J>, 0, iJj) . 


In the Euler angle coordinates, the bi~ invariant Riemannian metric 
on SO (3) is given by [B22] 

(ds) 2 = d0 2 + d(J) 2 + 2cos0d<j>diJj + dip 2 (B.30) 
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(B. 31) 


i.e., the matrix g of (B.2Q) is given by 


0 » 40 = 


1 0 cos0 

0 10 
cos0 0 1 


The (unnormalized) Haar measure is thus 


dy(<j>» 0, i|0 = /jdet g| d(j)d0dijj = sin0d({)d0d^ 


(B. 32) 


The bi-invariant Laplace-Beltrami operator corresponding to the metric 
(B.30) is given by [B22] 


2 

^0(3) = sin9 30 Csine + sin 2 0 ( 3$2 " 2 COS0 SJity T 3p 


+ -K ) 


(B. 33) 


For S0(3), notice that n = 3 and k = 1; thus the highest weights 
and Young tableaux, and hence the irreducible representations, are 
characterized by a single integer. Talman [Tl] computes a sequence 
D (<j), 0, ty), JL = 0,1,..., of unitary irreducible representations of 
SO (3); its matrix elements are given by 


V*- 0 > » 


where 


d L (9) = 


.m-n -imd> /ov -inty 
i e d (0)e 
mn 


V < -n fc [C&+m)!q-m)lQl+n)t(fi,-n)!] 1/2 
Z } a+m-t) ! (t+n-m) ! t ! (A-n-t) ! 


(B. 34) 


2&-bn-n-2t /0\ . 2t+n-m / 6 \ 

cos ( 2 ) srn 


(B. 35) 


for - SL <_ m, n < Z. 
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Here t is summed over all nonnegative integers such that the arguments 
of the factorial functions in (B.35) are nonnegative; i.e., 

m-n < t £ £ + m, 0 < t _< £ - n 


In fact, these are (up to equivalence) all of the irreducible represent- 
ations of S0(3). The Peter-Weyl Theorem yields the decomposition 


L„ (S0(3) ) = © H 
£ 

2 

where H is the vector space spanned by the (2£+l) functions 

Xj 

m, n — — £ , *»•>£}• 

The Clebsch-Gordan series for SO (3) is given by [H4, p. 135], 
[Tl, p. 116] 

j+k 

D J ©D k ^ 


(B. 36) 


(B. 37) 


£=j j-k 


"T - k 

The elements of the matrix which transforms D J © D Into the direct 
sum 37) are defined as follows [Tl, p. 118]. Assume that 


id, 9, ijO D n’,n ((t>t 63 ^ = 


m ,m 


V N k M* /j k M 0 

/ (2£+l) l J T t , D , C4>» 0, tjO* 

uT,v' \ m n V V" n 7 p >p 

(B. 38) 

where | j-k| <_ £ <_ j+k and -£ <_ p,p T £, and * denotes complex conjugate. 

/j k i\ 

The coefficients I I (known as the 3-j , Clebsch-Gordan, or vector 

\m n p J 

coupling coefficients) are given by 
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1/2 


fa k * 1 _ / t \ 21 -k+n 

- 

(i+k-£) ! (k+Jl-.i ) ! (£+i -k) ! (£+p) ! (£-p) ! 

'a 

3 

ii 

1 

1- 

(j+k+jl+1) ! (j+m) ! (j-m) ! (k+n) ! (k-n) ! 



(ft+ j-n-t) t (k+n+t) ! 

(£+p-t) 1 (t+k-j “p) ! t ! (£-k+j - t) 1 


(B.39) 


where the sum is over integral values of t such that the arguments of 
the factorial functions in (B.39) are nonnegative. These coefficients 
are widely used by physicists [W19], and are tabulated in [B24]. 

The n-sphere S n = {x £ R n |x'x = 1} is diffeomorphic to the homo- 


geneous space S0(n)/S0(n-1) . Harmonic analysis on S n is studied in 
terms of the spherical harmonics _J4], [El], [Sl3], [Tl], [VI]. Let 


n+1 


0 ^ denote the space of homogeneous polynomials of degree £ on R 

o 

(i.e. , f (cx^,. . . *cx n+ ^) = c f (x^, . . . ,x n ^)) , Then the space H 0 of 


£ 


spherical harmonics of degree £ on S n can be characterized in the 
following equivalent ways: 

(1) the restriction to S n of the subspace = {f £ t5^|A _^f = 0) 

R 

of harmonic homogeneous polynomials, where 


n+1 


n 

* l 


(B.40) 


R 


i=l 8x. 

r 


(2) the restriction to S n of the subspace of 0 ^ which is orthogonal 

2 2 

to the subspace { (x- +, . .+ x )f(x lS ...,x )|f £ „} (each of these 

x n x n 

subspaces is invariant under the action of SO (n+1)). 

(3) the irreducible subspace of L 2 (S n ) which is the carrier space 

of the irreducible representation of SO (n+1) of highest weight (k,0,...,0) 
(this representation is obtained by reducing the representation [D(X)f](x) 
f(X _1 x), where X e SO (n+1), x £ S n , f £ L 2 (S n )); 


(4) the eigenspace of the SO (n+1) -invariant Laplacian A on S 


n 


with eigenvalue -£(n-l+£). 


Property (2) implies that each f £ ^ has a unique expansion 
£ f A 


■2 


X 


2 J 


£-2j 


(B.41) 


j=0 

wllere ^0 —9-i and bJ is the largest integer t (Brockett [B3] 

also discusses this point). One can show [D4, p. 109] that the span of 
{H . £=1,2,...} is dense in the space of continuous functions on S n and 

X/ 

in L (S n ) , 1 <_ p < oo» Notice also that, using property (3) and the 
P 

Clebsch-Gordan coefficients for S0(n+1), we can write the product of 
two spherical harmonics as a linear combination of spherical harmonics. 


2 2 

Now we turn to the 2-sphere S . Any point (x^, x^) on S can 

be expressed in the polar coordinates (0,4)), where 0 £ 0 
0 <_ 4> < 2tt, by defining 


x^ = cos0; x^ = sinQ cos4>; x^ - sxn0 sin4> (B.42) 

2 

Notice that the point (0,40 on S can be viewed as the coset {(4>, 6, 40 , 
4>e[0, 2tt)}. In polar coordinates, the Riemannian metric invariant under 
the action of SO (3) is 


(ds) 2 = d6 2 + sin 2 6 d<f> 2 


(B. 43) 


and the corresponding Riemannian measure is 

dp (0,40 =* sin0 d6d4) (B.44) 

The corresponding invariant Laplace-Beltrami operator is {B3] 
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sin0 


'3 , . 3 x . 1 3 1 

M (3lne + iiHa 


(B. 45) 


Since functions on S can be viewed as functions on SO (3) which are 

independent of ip a the Laplace-Beltrami operator A 7 can also be obtained 

<s~ 

3 D 

from A go pj (see (B.33)) by setting ^ = 0, 

2 

We now consider the spherical harmonics on S . The normalized 
spherical harmonics of degree £ are defined by [Tl] 

il/2 


W 6 ’*) = ( - x) 


m I " (&-m) l (21+1)1' 

_(£+m) ! 4 tt 


P„ (cos0) e 
"ID 


im(f> 


(B. 46) 


v, _„(0,4>) = (-1)” Y*. (0,« 


£,-m 


£m' 


(B.47) 


for Z= 0,1,... and m = 0,1, ...,&, where P^ m (cos0) are the associated 
Legendre functions. These functions satisfy the four properties of 
spherical harmonics listed above. In particular, property (3) implies 
that [Tl] 




2£+l 


4ir 


2£+l 


1/2 


d' 0 M+|. 6,0)* 


4ir 


1/2 


d (0) 

mo 


(B.48) 


The product of two spherical harmonics can be readily expanded by 
employing (B.38) with m = n - 0: 






. ... (l V 3 

C_l)mfm y (2j+l)( 

j - fT-ri _(nrhn v 


£ 

\0 0 


j , m+m 


f (0.d>) 


(B. 49) 
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APPENDIX C 


THE FUBINI THEOREM FOR CONDITIONAL EXPECTATION 

Many of the proofs in Chapter 5 and Appendix D require the interchange 
of the operations of conditional expectation and integration with respect 
to time. In this appendix we will prove a theorem which justifies this 
interchange under certain hypotheses. 

Let X be a random variable defined on a probability space 
and assume that the expected value E[X] is well-defined. Let^" 1 be a 
sub-d field of 3% 

Definition C.l [W8] : The conditional expectation E X = EtxjS 5 '’ ] 

is P-almost surely uniquely defined by the following two conditions: 

* 

(a) E X is measurable with respect to 3F 

(b) Let I. denote the indicator function of the set A. Then 

A 

E[I E X] = E[I X] for all A e 3?' (C.l) 

A A 

We will first need the usual Fubini theorem [R2]. 


Lemma C.l (Fubini T s Theorem) : Let (Jh } ^jpu), i = 1^2, be a-finite 

measure spaces, let p^ x p 2 he the product measure defined on <3^ x 3^, 

Also, if h: £2, x £2- ** R, define the sections h : £2„ R and h : £2 -> R 

X £ *L 0^2 

by 


(u 2 ) 

= h(to lS 0) 2 ) 

for to 2 

E n 2 

CC.2) 

(“l> 

= h(w 1 ,w 2 ) 

for w 

e £2^ 

(C.3) 
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(a) If h: ^ x + R ^ ^ x ^-measurable and y^ x y 2 -integrable, 
then h : £2 ->■ R is y^-integrable for fL-almost. all w , and h : £2 ->■ R 

( 1 )^ 2 2 ± x ("2 1 

is Uj-integrable for y 2 -almost all u^. Furthermore, the functions 


ay 


/ 


r n 2 \ du 2 and 


I 


defined y -almost everywhere and ^-almost everywhere, are y^-integrable 
and y 2 -integrable, respectively, and 


I 


h dQi 


^ 1 X ^2 


x V 



dtJ l )dll 2 



(C.4) 


(b) If h: £2^ x £2 2 + R is ^ x ^-measurable and j (/ jh ^ [dy^dy^ 


/ ( / l\i 1 


1 2 


is finite, then h is ^ x y 2 -in.tegrable, and thus the conclusions of (a) 
hold. 


Theorem C.l (Fubini Theorem for Conditional Expectation) : Let (£2, P) 
be a probability space and consider the measure space ([0,t],S3, m) , where 
t is finite and m is the Lebesgue measure on the o-field of Borel sets in 
[0,t], Let be a sub-a field of gf. Assume that 


(a) f:[0,t] x £2 + R is ^T-measurable 


(b) E [f]:[0,t] x £2 -*■ R is S3 x ^"-measurable 

(c) / </ I A ( 00 ) I f s Cto) I dp Coo) ) ds is finite for all A e 3F . 


0 £2 
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(P-almost surely). 


$r 5 

Proof : Since E £f w (s)] is -measurable, it follows that 
ar\ 

j E [f^(s)]ds is ^-measurable, Thus, by Definition C.l, we need 
0 

only show that , for all A £ 8P , 


E[I a (oj) 


f z 0 ' 

00 r\ 


- c — / 

[f s (w)]ds] = E[I A (id) j fjs)ds] (C.6) 

'0 *T) 


However, 


E [X 


: A (w ) J E^ T [f s (to)]ds] 


f V“></ 

J Sl 0 


<5T T 

A [f (w)]ds)dP(w) 
s 


(C. 7) 




iA I a (oj)E^ [f g (w)]ds)dP(u>) 


SI 0 


(C.8) 


f ( f I A (u)Er [f s Cw)]dP( 0 )))ds 
^0 T2 




(C. 9) 


f C f 1 A (W) £ s ^ dP ^)) ds 


(C.10) 


f l A Co»C f f u (s)ds)dP(w) 


(c. 11) 


(C. 12) 


= E[I a (<d) 


/ 


f w (s)ds J 


Equations (C.7) and (C,12) are just the definition of expectation, 
while (C.10) follows from the definition of conditional expectation. 
Equation (C.8) is due to the fact that I A (w) is independent of t. Since 
the product of two measurable functions is measurable [R2], the integrands 
in (C.8) and (C.10) are SB x ^-measurable. Thus the application of 
Lemma C.l (b) to (C.10) yields (C.ll), because of assumption (c). Notice 
that 


f ( /’l A (w) f E [f s (w) ] 1 dP (w))ds 

+* f\ 


o a 




1 J [|f s (ai)|]dP(ii»)ds 


0 12 


J (.Jl k (u)\ f g (u))!dP(u))ds < « 


(C. 13) 


Hence, Lemma C.l (b) also implies (C.9). 

A similar result holds for the interchange of conditional expectation 
with multiple integrals over ( [0, t ]x. . . x[0, t ] , ^x...x^7, mx...xm). 


It can easily be shown that the application of this theorem is 
justified in Chapter 5 and Appendix D, since the integrands are just 
products of Gaussian random processes. 
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APPENDIX D 


PROOFS OF THEOREMS 5.1 AND 5.2 


D.l Preliminary Results 

In this section we present some preliminary results which are 
crucial in the proofs of Theorems 5.1 and 5.2. The first lemma follows 
easily from some Identities of Miller [Mil]. 


Lemma D.l ; Let x = [x^,... J x ^] 1 be a Gaussian random vector with 
mean m, covariance matrix P, and characteristic function M . Then, if 

X 

£ < k. 


3u^. . . 


Vv 


■V 


= {e 


1 * 


rX> 


3 1 j 2 


where 


Jv. ■ 

£ i x i. 




~. . . }M (u . 5 


•V 


£ , = im. 
2 3 



u P. 
n jn 


(D.l) 
(D. 2) 


and the suras in (D.l) are over all possible combinations of pairs of the 
{ I ^ j i-1 j . .. 5 £] . Also , 
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k-1 

E[x l x 2-" x k 1 “ E[x k 1E[x l X 2"- X k-l 1 * y P,. E[*. 


. X . . . . X . } 

'2 h J k-1 


(D. 3a) 


- E[x 1 ... x .]E[x i+1 ... Xt ] + 2 P j l E 1+1 E[x j 2 ”- x j i 1Et \ 

+ Stp. „ P. 

■u ^A+.l. J 




i+2 k 


E[x. ...x. ]E[x„ ...x n 3+ ... 


2 i+2 


3 3 J i & i+3 ^1 


(D.3b) 


hi, ... nil + ^ P . * n i . . . 

1 » JUJU 


m. 


1 J 2 J 3 J lc 


+ 


& 


J 1 J 2 J 3 3 4 



• m • *1* * * * 


(D. 3c) 


where the sums in (D.3b, c) are defined as in (D.l); also, in (D.3b), 

(j , a = l,...,i) is a permutation of {l,...,i} and {£ , a = i+l,.,.,k} 

Q* 06 

is a permutation of {i+l,...,k}. 


In the remainder of this appendix it will be assumed that £ and z 
are Gauss-Markov processes satisfying (5.1) and (5.3), respectively. We 
now define classes of random processes which occur as the order term 
in a Volterra series expansion in E, with separable kernels f aa e Section 
5.1), and we prove some lemmas relating these to other relevant processes. 

Definition D.l : The space A of Volterra terms of order j is the 
vector space over R consisting of all scalar-valued random processes Aj 
of the form 

N 

Aj(t) = yJ(t)Ajct) (D.4) 

i=i 
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i 


where 


u. 


JCO- f p.-.p ' 1 Y j(a 1 )... Y j(o.) ? , ( ai )... 5fc _ 

Jo Jq Jq 1,1 ;J » 1 


Co.) da. .. .da. 
J 1 J 


(D.5) 


where for each i, {£ k } are not necessarily distinct elements 

± M 

of £, and {y.} are locally bounded, piecewise continuous, deterministic 

Xj 

A 

functions of time. We denote by A. the space of all processes 

3 

A.pt[t) = Ef (t) [ z fc ] , where X ^ e A ... 


-3 


•H 


The next lemma, which is due to Brockett [B27], shows that terras of 

the form (5,9) with i < j (more integrals than C^'s) are in fact elements 

of A. . 
i 


Lemma D.2 : Let E, satisfy (5.1), and consider the scalar-valued 
process 


n(t) = 


f f "‘ f J ‘ 1 Y 1 foi)—Y j toj)E k fo )...5 k to m )d a r 

•'0 Jo Jo Mill 


• da. 


(D.6) 


where y. are as in Definition D.l, f for n ^ and i < j. Then 


n 


n e A.. 

i 


Proof ; It is easy to show using the construction of Brockett [B25, 
Theorem 4] that f|(t) has a realization as a time-varying bilinear system 


x(t) = A(t)x(t) + ^ Ky. (t)B A (t)x(t) 


(D.7) 


n(t) = (t) 


CD. 8) 
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— I 


<r ; 


"T 

j 


where A(t) and {B (t)} are strictly upper triangular matrices. The 
& 


Volterra series for (D.7) can be expressed via the Peano-Baker series 


[B25], and the Volterra series is finite because A(t) and {B (t)} are 

Kf 


upper triangular. In fact, because the original expression (ID* 6) con- 
tains only the product of i components of £, the Volterra expansion of 
n(t) = x^(t) will contain only an i th order term 

a 

J i j ^ yx vu 1 ; "*’"' 

o 


n 


(o - f L f 1 ... Yj<a 1 )...Yi<cj 1 ) 

Jo Ja Jo U<l 


E (a-.)...£ (a.)dcf-. 

s n-. 1 n. i 1 
1 x 


.da. 

i 


(D.9) 


where {n^ , 8, = l,...,i} is a permutation of the {k^, 8, = l,...,i} of 


(D.6). Hence ri e A^. 


Recall that the conditional cross-covariance PCa^ja^jt) (defined in 


(5,13)) was shown to be nonrandom in Lemma 5.1; it can be computed from 
Kwakemaak’s equations (5. 17)- (5. 19) . The following lemma shows that 


P. . (a ,cr ,t) is a separable kernel. 
*— 


Lemma D.3 ; P . . (a.. ,a ? ,t) is a separable kernel; i.e., it can be 
xj x / — 

expressed in the form 


m 


= 1 v o (t) 


(D, 10) 


k=l 


Proof ; Assume <_ <_ t. Then it follows from (5.17) that, for 

arbitrary real numbers a, B, and 6, 
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P(o 1 ,o 2 ,t) = P(a 1 )¥'(a,a 1 )['P t (0 2 ,a)- f ¥» (T,a)H* (t)R 1 (T)H(T)T(x,a 2 )dT-P(a 2 ) 

a 2 

- f y T (T,a)H' (T)R“ 1 (T)H(T)T(T,6)dT‘T(6,CT 2 )P(CT 2 )] 

•71 


= A(a 1 )[B(a 2 ) + C(t)D(a 2 )] 


(D. 11) 


Hence, if e^ denotes the unit vector in R n , it is obvious from (D.ll) 
that 


" e i P(0 1 "°2> t)e j 


(D. 12) 


has the form (D.10) for some functions 


The next lemma proves that certain processes which occur in the 
proof of Theorem 5.1 are elements of A ^ • 


Lemma D.4 : Let £ satisfy (5.1), and consider the scalar-valued 


process 


n(t) = 


pp ... p - 1 

J) Jn Jn 


P (n s a ,t)...P (a ,a ,t) 

n l n 2 W 1 m 2 n 2-l n £ m £-l m z 


* Y 2_ (0 ) ■ • • y - (0 . ) fa -p ... (0 . ) do ^ . do 

j J *J 


(D. 13) 


V i 


where the nn are arbitrary integers in and P^ are arbitrary 


X 1 X 2 


elements of P. Then n e A.. 

a 


Proof : Since we have shown in Lemma D.3 that P (0 ,0 ,t) is 


n. n. m, m. 
X 1 1 2 X 1 x 2 
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J 


t 


a separable kernel, the kernel of the integral (D.13) is also a separable 
kernel. Hence rj e A ^ 


Lemma D.4 implies that if X.(t|t) can he computed with a finite 
dimensional estimator for all e A^ , then rj(t|t) (where r) is defined 
by (D.13)) is also " finite dimensionally computable" (FDC) . 

D. 2 Proofs of Theorems 5,1 and 5.2 

The proofs of these two theorems are almost identical. We will 
prove Theorem 5.1; then we will explain how this proof is modified to 
prove Theorem 5.2. 

Proof of Theorem 5<l i As stated in Section 5.2, we consider the 
jth order Volterra term 

I Y 1 fo 1 )...Yj(c : j)'5 k ^ ff i)**-^k ^)d0 1 -..da^ 

Jo Jo 1 j 

(D.14) 

The theorem is proved by induction on j , the order of the Volterra 
term. The proof for j=l is presented in Section 5.2. We now assume 

t h (t) 

the theorem holds for j £ i-1 (i.e., we assume that E [e r](t)] is 


n(t) = 


FDC, where r) e A_., for j <_ i-1) , and prove that it holds for j=i. 

The proof is in two steps. We first reduce the problem to the 

A 

computation of the elements of A^ (see Definition D.l). We then show by 

A 

induction that all of the processes in A^ can be computed with finite 
dimensional estimators. 
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(D . 15 ) 


(i) We first consider the computation of x(t|t)» where 

(t) 

x(t) = e n(t) 


Now 


:(t|t) = r f 1 . . . f 1 - 1 

Jq Jq Jq 


Y 1 (cr 1 )... Yi (a i ) 


t V fc) 

E [e g. (a 1 )...£; k (a i )]da 1 . . .da ± 
K 1 i 


(D.16) 


By equation (D.l) and the definition of the characteristic function, it 
follows that 
t 

1 i 

?o(h[t)+'n -^floCk) 

2 {6 1 (0 1 ).„6 1 < 0l ) 


/ P. . (ct ,0 ,t)6. (0 )...6. (cr )+ ...} 

^ J 1 J 2 m l m 2 J 3 ™3 J i m i 


CD. 17) 


where 


5. (0 ) - %. (a ft) + P n . (t, a ,t) 
t m m 1 %, j m * 

n rv J J ry 


(D.18) 


’a a "a a 


'a a 


and {j , a = is a permutation of {k , a = l,...,i}. 

Ct ct 


Equation (D.3) implies that (D.17) can be rewritten as 
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t 5 a (t) 

E c [e 36 ^ Co i ) ] 

1 i 

+ J P ££^ fc ) t 

= e tE (c^) . . (0^)] 




t) E c [e, (cr )... 4, (a )] 


J 2 ra 2 


j - m . 
J x 1 




+ - • - + 2 P H,k, ' • - P «,,k. Ct 


CD. 19) 


Hence, Lemmas D.2 and D.4 imply that the computation of S(tjt) involves 

A 

only the computation of elements in A^ , j=l,...,i. However, the in- 

A 

duction hypothesis implies that the elements of A.., j=l,...,i-l are 
FDC, so we need only prove that the elements of A.. are FDC. 

ii) Assume that n £ A^ is defined by (D.14) (where j=i)» Then 
the nonlinear filtering equation (1.7)- (1.8) for fi(tjt) is 

dfj(t( t) = E t [y 1 (t)C k (t) A (t) I 


+{E t [n(t)C’ (t)]-fi(t| t)|* (t| t)> H'(t)R X (t)dv(t) (D. 20) 


where 


and 


dV(t) = dz (t) - H(t)£(t|t)dt 


(D.21) 


A (t) = 


f f 2 ”■ f 1 1 T 2 (c 2 >...r i (® i ) ? fc (® 2 ) -" S k.<° i ) do 2— dD l 

Jo Jo Jo 2 1 


(D. 22) 
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is an element of thus, by the induction hypothesis X(t|t) is FDC. 

The first term in (D.20) (the drift term) is (see (D.3a)) 


E t [y 1 (t)^(t)Ut)] = y 1 (t)| k ^(t|t)A(t|t) 


-h Yl Ct)E I 



■f 11 


Q P t 1 ,k 1 (t>0 i* t) T'2 (0 2 ) '”Yi (0 i ) 





(D.23) 


The first term in (D.23) is FDC by the induction hypothesis, and the 
second term, by Lemmas D.2 and D.4, is also FDC (i.e., it is an element 
O f Aj_ 2 ) • 

Equation (D.3a) implies that the gain term in (D.20) is the row 
vector (here P^(cf,t,t) denotes the i*-* 1 row of P(a,t,t)) 

[q(t)^ ’ (t) I~rj (t 1 1)£ 1 (t 1 1 ) 



(a 1 )*..y i (a i ) 


* * * ? k ± (a i y P k £ 3 fc * ' t 5 da 1 * ' * d °k 

(D. 24) 


A 

each element of which, by Lemmas D.2 and D.4, is an element of Thus, 

by the induction hypolnesis, the gain term, and hence the nonlinear equation 
(D.20) for n(tft) is FDC. This completes the proof of Theorem 5.1. HI 
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Proof of Theorem 5.2 : This proof is identical to the proof of 


Theorem 5.1> except for the computation of the drift term in (D.20), 
so we will consider only that aspect of the proof. Assume that n is 
defined as in (5. 9)— i.e, , n is given by 


n CT j_ r°j“i 

?k n (o m, J . ^1 ^ (o j > do l • • ' da j 

Jq 11 11 

(D. 25) 


n(t) = 


where i > j ; we also assume that tn^ - . . . = m^ = 1 and m^ 4 1 for 3 > a. 

In this proof, the induction is on j , the number of integrals in (D.25). 

That is, we assume that the theorem is true when r) contains < j-1 integrals, 

and prove that the theorem holds if n contains j integrals. 


The nonlinear filtering equation yields 


dfj (t| t) = Et f'Y 1 Ccf ]L )€ k (t)...£ k (t)X(t)] 

1 a 

Ct)]-n(t[t>£* Ct| t)> h t (t)R _1 Ct)dv(t) (d.26) 

where dv is defined In (D.21) and 


X(t) 





■a+1 


(a )...£, (cr )da„...da. 
%+l k i n i 2 J 


The drift term In (D.26) is, from (D.3b), 


(D. 27) 
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E [Yi ( t)£ k (t)...£ k (t)A(t)] 
1 a 


= Yl (t)E [g k (t)...£ k (t)]A(t|t) 
“la 


-V° (t)l 

* a 


•E 1 ! f p...p'- 1 Y2 (a 2 )...y.(a.)P 

Jo Jo Jo 1 


a+1 a+1 


•Ka (P )••■€(, <<? )da ...da } +... 

x. m , _ x,. m. Z i 

a+2 a+2 ix 


(D.28) 


where . . . ,5,^} is a permutation of {k^,...,k } and 

is a permutation of {1c , , ...,k. }. The first term of (D.28) is FDC 

q>*tX i 

by the induction hypothesis, and the other terms, by Lemmas D.2 and 

D.4 and the induction hypothesis, are also FDC. We have also used the 

£ 

fact that the conditional distribution of £(t) given z is Gaussian 

£ 

(Lemma 5.1) in order to conclude that E [£ (t)...£ (t) ] can be 

1 A o 

computed (via (D.3c)) as a memory less function of £(t|t) and P(t). 

The gain term in (D.26) is also FDC; the proof is identical to that 
of Theorem 5.1. Hence n(t|t) is FDC, and Theorem 5.2 is proved. M 
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